arm 1176-jzfs cpu-based low-power subsystem
Post on 23-Oct-2014
41 Views
Preview:
TRANSCRIPT
A Practical Guide to Low-Power DesignUser Experience with CPF
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:2
ARM 1176-JZFS CPU-Based Low-Power Subsystem: Methodology to Reduce Electrical and Functional Failure in a Low-Power DesignBy David Flynn, Fellow ARM; Sachin Idgunji, Architect, ARM; Felix Jen, Manager Design Implementation, UMC; Wen-Pin Lin, Senior Manager, UMC; and Vivek Shukla, Cadence Architect, Bangalore.
AbstractLeakage control has become a major design issue due to leakage currents that drain a battery’s charge even when a device is inactive or in standby mode. Transistors in each new process generation leak more than those in previous generations, due to transistor scaling effects, only exacerbating the problem.A few years ago, designers began using power shut-off in their designs and EDA suppliers provided low-power methodology solutions. However, power shut-off created next level issues like performance, wear-outs of power switches, more complexity in the power switch analysis, managing system-level performance due to power-up time, test, and reliability. This required accompanying ASIC implementation and verification methodology to reduce the risk of chip failure, both functional and electrical. We demonstrate the application of these techniques and the methodology on an ARM1176-JZFS CPU-based system that is targeted for a 65nm technology node, which achieves higher speed, but has lower leakage, with a methodology to reduce post silicon electrical failure.
Overview of Ulterior Project
Figure 1: Collaboration and Contributors
Dual Vt technology, 65nm technology models, SOC implementation
IP and Power ManagementCollaboration
Between Leaders
to Deliver the Low
Power SolutionMemory compiler with memory shut offs, std cells, PMK library
Complete implementation methology for Ulterior CPU
Sec14:3
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Joint Collaboration Contributors: This effort has been jointly executed by ARM, UMC, and Cadence to accomplish the following tapeout and silicon measurements.
• UMC: 65nm standard process Looking for performance and yield on the LP implementation −
• ARM: ARM1176JZFS based SoC to demonstrate power management on a high performance design
Low-power architecture −Power management and low-power memory IP for managing leakage −
• Cadence: CPU implementationComplete low-power tool and methodology support −
UMC Technology Trends and Process Selection for Project
This section discusses the process parameters and process selection. Figure 2 illustrates the process nodes used in this project and its evolution over the 90nm process.
Figure 2: UMC Technology Trends
Low Leakage (LL) process has approximately half of the performance at 1.2V in comparison with Standard Process (SP) running at 1.0V (Figure 3).
Technology node L90 1P9M
L65 1P10M
Process SP/LL SP/LL
Lithography 193nm Dry 193nm Dry
Core Voltage (V) 1.0/1.2 1.0/1.2
tox Core (A) IO (A) (IO Vdd)
16/22 30/52/65 (1.8/2.5/3.3V)
12/19
30/52/65 (1.8/2.5/3.3V)
Physical Gate Length (nm) 70/80 40/55
Salicide CoSi2 NiSi
Interconnect Cu Cu
Inter/Intra Metal Dielectric Low-k (k=2.9) Low-k (k=2.9)
1XMetal Pitch (nm) 280 200
6T SRAM Cell Size (um2) 1.16/0.99* 0.499*/0.525**Cell non-shrinkable
Sec14:4
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Figure 3: Managing Performance per Watt
As shown in Figure 4, Low Leakage (LL) Nodes gain significantly (>80x) across the process space. They are highly sensitive to temperature (sub-threshold component). High Performance Nodes gain an average of 25% on Drive Strength. This is dominated by the process spread.
Figure 4: Low Leakage vs. Performance Node Tradeoffs
High Performance Nodes gain significantly (average 30%) across the corners. The power dissipation can be managed effectively with voltage scaling. Multi-channel devices can be used to reduce the leakage.
Nor
mal
ized
loff
(pA
/um
)
Intrinsic R.O. Delay (ps/stage)
106
105
104
103
102
101
10 5 10 15 20 25
65SP1.0V
65LL1.2V
90SP1.0V
90LL1.2V
1000.00
100.00
10.00
1.0025
TT
25
SS
25
FF
25
SF
25
FS
125
TT
-40
FF
Corners
Gai
n
Leakage Gain (LL vs SP) NMOS ratioPMOS ratio
1.40
1.35
1.30
1.25
1.20
1.15
1.10
1.0525
TT
25
SS
25
FF
25
SF
25
FS
125
TT
-40
FF
Corners
Gai
n
Drive strength gain (SP vs LL) NMOS ratioPMOS ratio
Sec14:5
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Figure 5: Performance Gain and Impact of Voltage Scaling
Delay for some key structures is V-α, α is in the range of 1.5 – 2. As shown in Figure 6 and Figure 7, temperature sensitivity decreases with lowered voltage (Zero Temperature Coefficient for block around 0.78V). Variability is highly sensitive to the voltage and increases drastically at lower voltages impacting the functionality of design.
Figure 6: Delay Dependencies
1.81.61.41.2
10.80.60.40.2
025
TT
25
SS
25
FF
25
SF
25
FS
125
TT
-40
FF
Corners
Gai
n
Performance Gain (SP vs LL) Block 1Block 2
2.00E+00
1.80E+00
1.60E+00
1.40E+00
1.20E+00
1.00E+00
8.00E+01
6.00E+01
4.00E+01
2.00E+01
0.00E+000.5 0.6 0.7 0.8 0.9 1 1.1
Voltage
Rat
io (S
P t
o L
L)
Voltage scaling on performance and power SpeedDynamic Power
0
50
100
150
200
250
300
350
400
450
0.75 0.85 0.95 1.05 1.15 1.25
Voltage (V)
Del
ay (n
orm
aliz
ed)
Voltage vs Delay (Average)
Block 1Block 2Block 3
Delay sensitivity to temperature
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.85 0.9 1 1.1
Voltage
Del
ay s
lop
e w
ith t
emp
erat
ure
Delay sensitivity to temperature
Sec14:6
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Figure 7: Variability
ARM1176JZFS based SoC overview and Advance Leakage Control
Figure-8 illustrates the block diagram of ARM 1176_1616 cpu and its subsystem (Ulterior Chip). Here are the key design features:a. ARM1176 CPU with dual 16K caches
• With State Retention Power Gating (SRPG) leakage management• Multi-voltage design (VRAM, VCPU, VSOC)—but not DVS, although will
include level shifters• Support-independent power/energy analysis• Diagnostic SRPG error rate analysis
b. ARM AXI-based system-on-chip support logic
• SDRAM and Flash memory controllers• IEM-based performance and leakage controllers• Level-2 RAM on-chip memory (with BIST)
c. Linux OS port peripherals
• Demonstration of the entire system running real applications
40
35
30
25
20
15
10
5
0
y = 16.396x-3.0055
0.7 0.8 0.9 1 1.1 1.2
Dispersion data
Core Voltage (V)
Voltage vs Variability
Ob
serv
ed s
td d
ev (a
rbiti
trar
y un
its)
Sec14:7
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Figure 8: Architecture of ARM 1176
Ulterior Power Strategy
Ulterior consists of two switchable domains and one always-on domain. The VDDCore power net is switched power grid derived from VDDCPU. VDDCPU itself can be switched off externally and can run at different typical voltage value than VDDSOC. VDDSOC is always-on power for the chip, which feeds to small logic required to be always-on. VDDCore, which can be switched off, contains multi-stage turn-on and turn-off control coming from advanced leakage controller. All the flops in VDDCORE domain are retention flops, while all the memories in the VDDCPU can work in 3 low-power modes. Figure 9 shows the logical power domain definition for ARM1176_1616 core.
Figure 9: Power Domains on Ulterior
AMBA AXI Interface
Memory Management
FlexibleDFT/MBIST
DebugInterface
CeompressorController
InstructionCache (16K)
DataCache (16K)
TCRAM 0/1Interface
TCRAM 0/1Interface
TrustZone™
enabledARM11 core
InstructionInterface
DataInterface
DMAPeripheral
Port
AR
M11
76J2
_161
6JTAGdebug ARM1176
64-bit wide
SRPG CPU/cache
“Level-2”
64-bit wide
Banked SRAM
64-bit AXI Inter-Connection Matrix
ALVCLeakageControl
AHB/APBbridge
PLL +clkgen
Timer x2 UART X2 INTC GPIOFlash16-bit
SDRAM32-bit
ARM 1176_1616 SOC Block diagram
DTCDataRAM ITCDataRAM ITCDataRAM
DDataRAM
DDataRAM
DDataRAM
IDataRAM
IDataRAM
IDataRAM
DDataRAM
DTCDataRAM
DDataRAM
DDataRAM
DDataRAM
DDataRAM
IDataRAM
IDataRAM
IDataRAM
IDataRAM
IDataRAM
ITagRAM
ITagRAM
ITagRAM
ITagRAM
IValidRAM
BTACTagRAM
DTagRAM
DTagRAM
DTagRAM
DTagRAM
DDirtyRAM
DValidRAM
BTACDataRAM
Expectedlocation
of Pb
Expectedlocation ofInstructionDecoders
Expected location of A1176 Core
TLBRAM
TLBRAM
On the connectors, thelocation of the pins is
indicated by a line
Instruction read onlyand data read/write
ports
Peripheral andDMA ports
Clock, reset, and interupts port Coprocessor ports ETM ports
Data side Instruction Side
Figure 6-3 Alternative macrocell floorplan
Instruction read onlyand data read/write
ports
Peripheral andDMA ports
VDDCPU(Can beSwitched offExternally)
VDDSOC(Always on logic)
VVDDCPU(gated logic)
Need to check ifall output ports areclamped
Ulterior ARM1176 Voltage domains (Logical)
Sec14:8
ARM 1176-JZFS CPU-Based Low-Power Subsystem
To reduce the wear-out of the power switches as well as maintaining the performance, Ulterior proposes two kinds of power switch matrices—the weak network and the strong network. Control for both networks comes from the Advance Leakage Controller(ALC) separately; the weak network has 8 power shut-off control input requests and acknowledge; the strong network has one shut-off enable request. Weak resistive network brings up virtual grid gently with sufficient current to ensure VVDD reaches to 0.95*VDD @high temperature. Strong matrix is turned-on once virtual grid reaches to 0.95*VDD to reduce the IR drop. Implementation of 8 weak enable-based network is to carry out wear-out experiment with 1/2/4/8 enables. All the power controls acknowledge signal selection is based on STA measurement. Figure 10 has one example sequence, where 8≥N≥1.
Figure 10: Power Switch Request and Acknowledge (8≥N≥1)
Memory subsystem contains 37 single port memories; each memory can work in three low-power modes (Figure 11):a. Standby mode (HALT)
• CEN disables the memory
b. Retention mode (SRPG)
• Power is supplied to core array to retain state• Power is off for periphery for reduced leakage• Outputs are clamped to zero
VDDCPU
WDDCPU
PWR_REQ PWR_ACK
CLOCK
N_ISOLATE
N_RESET
N_PWR_REQ
N_PWR_ACK[N][N]
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:9
c. Shutdown mode (HIBERNATE)
• Power is off for core and periphery for reduced leakage• Outputs are clamped to zero• Possible through both integrated MTCMOS and separated power sources for
core and periphery
Figure 11: Memory Retention Power Gating
Implementation Overview
Figure-12 illustrates the Cadence CPF-based low-power implementation flow, with the following key highlights.
• Single CPF used from the synthesis to backend, power and timing sign-off• Leakage optimizations in the synthesis and in the backend flows• CPF-based MMMC flow in the Encounter platform• PSO Planning flow to meet performance/electrical/power goals• Automated Power Switch Network Simulation for multiple combinations• CRC model based spice simulation to reduce TaT for complex power
switch analysis
Column Decoders
Sense Amps and I/O
Column Decoders
Sense Amps and I/O
PGEN RETN PGEN
CORE VDD LOGIC VDD CORE VDD
CORE Ground LOGIC VSS CORE VSS
PGEN_
RETN_ PGEN_
HVtSwitch
HVtSwitch
HVtSwitch
HVtSwitch
HVtSwitch
HVtSwitch
Wo
rd L
ines
Sec14:10
ARM 1176-JZFS CPU-Based Low-Power Subsystem
• Comprehensive IR/EM checks• Low Power Verification throughout the flow
Figure 12: Ulterior Implementation Flow
While addressing low-power implementation and its verification, it is also important that methodology be adequate enough to deal with the challenges of maintaining performance and reliability. Here are the key issues addressed by the implementation methodology:Power shut-off and MSV implementation
• Maintaining the system performance is a challenge• CPF based methodology simplifies the Low Power Insertion
Low-power verification
• Verifying the low power • Through RTL and gate simulations• Through formal checks
LEC + Pow
er Checks
Conform
al Low P
ower
RTL LP Simulation &LP Auto Assertion Generation/Checks
Incisive Enterprise Simulator
PD-Aware Logic Synthesis & DFTEncounter RTL Compiler
PD-Aware Physical ImplementationSoC Encounter
Timing & SI SignoffEncounter Timing System
IR drop & Power SignoffVoltageStorm-PE & DC
Physical Verification
CPF Integration & Quality CheckConformal Low Power
CPF
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:11
Ensuring Reliability• New power structures and strategy may lead to• Defects in the t>0 time• Needs to be taken care in the design• ARM has come up with new approach in the design• To avoid electrical failures• How Implementation would support such mechanism
Ulterior ImplementationLow-power Verification
Low-power verification is the backbone of any low-power flow. Verification can be performed through dynamic simulation on the RTL as well as gate, and static checks.Cadence Encounter® Conformal® Low Power verifies the correct implementation of low-power design techniques and validates the design using formal techniques (versus simulation) throughout the design process. It also decreases the risk of missed bugs, before a product goes out the door. Conformal Low Power accepts RTL/gate-level netlists with or without explicit power or ground nets and CPF file as input. It performs structural and rule-based checks to verify that low-power implementation is as per the power specification defined in the CPF file. Under Low Power Equivalency Checking, Conformal Low Power ensures that low-power optimizations do not introduce a technology mapping bug or a logical bug in the design netlist. It reads golden and revised designs along with CPF files and checks the logical equivalence without setting any constraints on low-power control signals.The RTL and Conformal Low Power flow is used to verify the CPF. It reads RTL and CPF as input and reports missing, and redundant low-power rules as per the power architecture of the design.Conformal Low Power flows for the synthesis and physical netlists are used to verify the low-power implementation with respect to power specification defined in the CPF file. Since instances in the synthesis netlist does not have power ground pins, power domain are assigned based on the CPF definition. The power domains to the instances in the physical netlist are assigned based on the power and ground pin connectivity. Power domain consistency check (PDCIC) performs power-aware equivalence checking and checks low-power cells. The PDCIC between synthesis and the physical netlist performs the power-aware equivalence checking between the golden and revised design. In this case, it assigns the power domain for the synthesis netlist using the CPF definitions while the power domains for the physical netlist use the power and ground pin connectivity.
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:12
Figure 13: Verification
Figure 14: Ulterior Backend Flow Overview
CPF
RTLConformal LP VerifyCPF consistency checking
Conformal Low PowerEwuivalence Checking
Physical implementation
Conformal Low PowerEquivalence Checking
Gate netlist
Physical netlist
Logic Syntheis & DFT
Front-endSignoff
Back-endSignoff
Power Equivalence
CLP(RTL and CPF checks)
LEC (RTLvs
Synthesis netlist)
Power aware LEC (RTLvs
Synthesis Netlist)
CLP(Synthesis Netlist)
LEC (Synthesisvs
Backend Netlist)
Power Aware LEC (Synthesisvs
Backend Netlist)
Power Aware LECincluding PDCIC (Synthesis
vsPhysical Netlist)
CLP(Physical Netlist)
CLPUnified (hierarchical + top level)
Physical Netlist
Low Power Check Progress
Load CPF and Create RC, Timing Optimization MMMC views
Floorplanning of Power Domains, relative macro placement
End cap, well-tap,
Power Switch cell placement
Power Planning
(PSO, Well-tap hookup)
Placement
(Isolation, SRPG)
Always-on-Nets Synthesis
For RETAIN and PSO control signals
Power Routing
sroute-LVLSeconday/STDCELL preroute
nanoroute-SRPGSenconday/Always-on-nets
Design Import Multi_Mode Pre-CTS Optimization
Domain aware CTS
Multi-Mode postCTS Optimization
SI and domain Aware nanoroute
Multi-Mode postRoute Optimization
Multi_Mode postRoute SI Optimization
Multi-Mode Hold optimization
SOC Signoff timing and ciltic checks
Multi-Mode leakage Optimization
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:13
Ulterior PnR Flow
It is important that methodology and flow used for the Place and Rout (P&R) captures the additional complexity due to the power strategy. The Backend tool takes the power architecture information from the CPF file and is able to perform all the steps in automated manner. Figure 14 captures the CPF-based automated low-power P&R flow used in the project.The flow starts with the design import and loading of the CPF file. Inside the Cadence SoC Encounter(r) RTL-to-GDSII System, loading and committing of CPF on the design occurs through the loadCPF and commitCPF commands respectively.The loadCPF command mainly captures the following information from the CPF:
Low-power cells such as level shifter, isolation cell, power switch cell, SRPG, −and always-on buffersAll the power and ground nets −The power domain with switchable attributes and its global connections −Attaches libraries to the appropriate power domains −Rules such as power switch, level shifter, isolation cell, and SRPG −Different analysis views −
The commitCPF command mainly creates the following information according to the loaded CPF:
Creates power domains and defines their global connections −Checks and inserts level shifters and isolations based on the rules −Checks and replaces flip-flops with SRPGs based on the rules −Creates the analysis views −
Once design is imported and CPF is loaded, the following are the key steps performed by the SoCE:(i) Low-power CPF flow and the MMMC settings(ii) Different kinds of the power shut-off (PSO) for the design
• On-chip PSO Column-based checker board PSO −PSO for hard macro (memory) −
• Off-chip PSOVDDCPU, secondary domain for VDDCore, can also be shut off externally −
(iii) Different kinds of level shifter implementation for the designLow-to-high level shifter −
(iv) Isolation implementation for the PSO power domains(v) State retention for the PSO power domains(vi) Always-on net synthesis
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:14
(vii) Secondary power pin connection for the SRPG/always-on buffer/LVL shifter
(viii) Placement in MMMC(ix) On-chip Variation (OCV) timing analysis mode(x) Timing optimization and analysis in MMMC(xi) Clock tree synthesis in MMMC(xii) Domain-aware routing(xiii) MMMC SI closure(xiv) Hold timing optimization in MMMC(xv) MMMC leakage optimization(xvi) Running multiple-CPU processing to reduce the runtime for
multiple mode analysis
Figure 15: Floorplan and power plan
Figure 15 illustrates the floorplan and the power switches columns of the ulterior design.
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:15
The key in the implementation is the power network planning. As per the power architecture section, we need two types of power switch network—the weak network and the strong power switch network. Weak network itself has eight different enables and same number of acknowledgements.
Weak Network Enable
As shown in Figure 16, 8 weak enables feed to 16 columns spread uniformly and interleaved; this was done to reduce the rush current issue and to bring up the power grid gently up to 95% of VDD. Every vertical weak column has certain number of cell rows skipped (13 rows to be precise) and skipped rows either have strong network switch cell or the weak network return path switch cell.
Figure 16: Weak Network Enable
Weak Network Acknowledgement
As shown in Figure 17, there are 16 separate acknowledgements on the return path of the weak network. Out of these 16 acknowledgements, 8 have been connected to ALC state machine based on STA measurements.
Figure 17: Weak Network Acknowledgement
REQ_WEAK_0 REQ_WEAK_1 REQ_WEAK_2 REQ_WEAK_3 REQ_WEAK_4 REQ_WEAK_5 REQ_WEAK_6 REQ_WEAK_7
ACK_WEAK_0 ACK_WEAK_2
ACK_WEAK_1 ACK_WEAK_3 ACK_WEAK_5 ACK_WEAK_7 ACK_WEAK_1_1 ACK_WEAK_3_1 ACK_WEAK_5_1 ACK_WEAK_7_1
ACK_WEAK_4 ACK_WEAK_6 ACK_WEAK_0_1 ACK_WEAK_2_1 ACK_WEAK_4_1 ACK_WEAK_6_1
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:16
Strong Network Request
The strong network has been used to reduce the IR-drop once the power ramps up to 95% of VDD through the weak power network. The strong network has a higher number of PSO switches than the weak, but it is important that the number should not be so high such that leakage through these switches becomes an issue. As shown in Figure 18, single request for the strong network feeds to 16 columns spread uniformly and every column has 351 strong network cells. So, the implementation ends up having the 5600 strong power switches and total leakage through the PSO is 0.6mW.
Figure 18: Strong Network Request
Strong Network Acknowledgment
Figure 19 captures the return path of the strong network with 16 acknowledges; STA measurement has been performed to choose one out of the 16 to connect to the ALC controller.
Figure 19: Strong Network Acknowledge Return Path
REQ_STRONG
ACK_STRONG_0 ACK_STRONG_2
ACK_STRONG_1 ACK_STRONG_3 ACK_STRONG_5 ACK_STRONG_7 ACK_STRONG_1_1 ACK_STRONG_3_1 ACK_STRONG_5_1 ACK_STRONG_7_1
ACK_STRONG_4 ACK_STRONG_6 ACK_STRONG_0_1 ACK_STRONG_2_1 ACK_STRONG_4_1 ACK_STRONG_6_1
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:17
Well Tap Cells
This standard cell library from ARM can be used for back-bias technique to reduce the leakage. This technique is not used in this project, but proper well taps still need to be inserted for bulk connections. Figure 20 shows the well tap placement in the design. Also, the SoC Encounter system automatically takes care of the domain association of the well tap cells while inserting them.
Figure 20: Well Tap Cells
Isolation Cells Insertion and Placement
Isolation cells are inserted between the always-on and switchable domain based on the isolation rules and cell specified in the CPF file. As shown in Figure 21, in the ulterior design, the isolation has been placed between PDsoc and PDcore, PDcpu and PDcore, and PDsoc and PDcpu. Placement of these isolation cells is in the ON domain.
Figure 21: Isolation Cells
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:18
SRPG Cells
There are ~40k state retention flops in the PDcore PSO power domain (Figure 22). While all the flops are inserted during the synthesis, its placement and power connections have been performed during P&R. Secondary power pin connection to always-on power for state-retention flops is shown in Figure 23, SRPG is double height cell with VSS in bottom. The entire secondary power hookup for the SRPG were done using Cadence NanoRoute router.
Figure 22: SRPG Cells Spread
Figure 23: SRPG Standard Cell
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:19
Ulterior Power Sign-Off
With the multiple power switch networks and multiple enable conditions for the weak network, this project required extensive power switch simulations as illustrated in Figure 24. Here is the brief summary of the analysis performed:
• Power calculation Using Common Power Engine (CPE) for power calculation for different −modes and corners Using Powermeter to generate dynamic current waveforms with a −vectorless approach
• Static and dynamic IR-drop analysis Using Cadence VoltageStorm power analysis for static IR-drop analysis and −EM check with power from CPE Using VoltageStorm for dynamic IR-drop analysis with current waveforms −from Powermeter
• Power gating analysis for the multiple request enable conditions Using Powermeter to generate spice decks and Ultrasim to run −power-up simulation Using VoltageStorm for dynamic IR-drop analysis with rush current −from power-up Perform the ECOs on the power switch network to get the ramp-up time, −rush current and request to acknowledge time as per the required specs
• Decap ECO flow and PSO ECO flow
Figure 24: Power Analysis Flow
LibrarySpice model, .cl, .lib, .spice
Design data from FEDEF, LEF, SPEF, TWF
PowermeterPower up deck generation
PowermeterPower Calculation
UltrasimPSO spice simulation
.tmwaveforms
.ptipeakrush current file
plots reports block .cl for top level analysis
*Except VStorm dynamic IR drop, all runs were done in 3 corners (FF, SS, TT). VStorm dynamic IR drop was done in TT only.
plots reports decap eco file
VStormStatic IR/EM analysis
VStormDynamic IR analysis
Window based decap eco was used toget a rough idea of how much decap was needed
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:20
Figure 25 illustrates the static IR-drop plots and numbers. The average drop across the PSO varies between 15% to 25% of the total IR-drop. So it is important that one does careful analysis at the time of PSO strategy to maintain the performance.
Figure 25: Static IR-Drop Analysis
Figures 26 and 27 illustrate the power switch network simulation results and waveforms. For simplification, Figure 27 illustrates the limited and optimal set of simulation results at the end of the project, but during the power switch network optimization, several combination and corners have been tried out to get optimized power switch network. As you can see, rush current (Ipeak and Iavg) are minimal when we have one weak enable turned on, but ramp-up time is 12X more in comparison with the 8 weak enable ON. Similarly, 8 weak enable conditions has 12X more rush current than the one weak enable condition.
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:21
Figure 26: Power Switch Network Simulation Results
Figure 27: Power-Up Simulation Waveforms
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:22
Assembly and PackagingFigure 28 illustrates the bond diagram of the Ulterior SoC. It uses 352-pin package for the 4x4 square mm die. It contains the following:
• 180/244 signal pins• 40 mixed-signal (power measurement)• 52 power/ground
Figure 28: Assembly and Packaging
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:23
Ulterior Implementation ResultsWhile Silicon Measurements are under progress, here are the stats from the tape-out data measurements:
Gate Count: 1.1M gates, Instance Count: ~300K, 37 memories, 3.09mm2
• Utilization is 80% for core
Performance: 615 MHz in WC corner (0.9V, SS, 125C)
Power Savings:• Ulterior total leakage power savings : 3X• VDDCore domain leakage power savings : 100X
2 types of power switch network:• Weak : 14 columns, 8 enables, 280 total switches, 20 per column• Strong : 16 columns, 1 enable (16 acks), 5616 total switches per column 351
Extensive power analysis • Ramp-up time for every corner (with combination of weak enables)• Ranges from 8ns to 107ns• Rush current control with multiple combination
(ranging from 39mA to 481mA)• Current limit spec has been met for HEADER follow pin• Through the multiple iteration• Drop through the switch 6mV (average), range+ 3mV
_____________________________________________________David Flynn, a Fellow in R&D at ARM Ltd, has been with the company since 1991, specializing in System-on-Chip IP deployment and methodology. He is the original architect behind ARM’s synthesizable CPU family and the AMBA on-chip interconnect standard. His current research focus is low-power system-level design. He holds a number of patents in on-chip bus, low power and embedded processing sub-system design and holds a BSc in Computer Science from Hatfield Polytechnic, UK and a Doctorate in Electronic Engineering from Loughborough University, UK. He is currently Visiting Professor with the Electronics and Computer Science Department at Southampton University, UK. (david.flynn@arm.com)
Sachin Idgunji, a Principal Engineer at ARM Inc. in the Research Group specializing in Systems/Circuit design and analysis. His current research focus is in variation analysis, low power design and statistical techniques. Prior to joining ARM, he was at Synopsys Inc. where he led several projects ranging from design specification through tape out in areas of graphics, networking and embedded processing. Prior to Synopsys, Sachin worked at IBM Labs (India) and PCS-Data General and has over 18 years of industry experience. Sachin holds a BE in Electronics Engineering from Shivaji University, India. (sachin.idgunji@arm.com)
ARM 1176-JZFS CPU-Based Low-Power Subsystem
Sec14:24
Felix Jen, a section manager in the Design Technology Support Section of IP Development and Design Support Division at UMC, has been with the company since 2002, with expertise mainly focusing on IC design implementation and design methodology. (felix_jen@umc.com)
Wen-Pin Lin, a Senior Technical Manager and Staff in IP Development and Design Support Division at UMC, joined the company in 2007. His expertise mainly focuses on deep sub-micron IC design implementation and design methodology. (wen_pin_lin@umc.com)
Vivek Shukla serves as ]an R&D Architect at Cadence Design Systems, Bangalore. Before Cadence, Vivek worked at Beceem Communications, a startup in Wi-Max, where he led multiple tape-outs of Wi-Max 802.16e standard compliant chips. Prior to Beceem, he had a 5 year stint at Intel during which he worked and led efforts for Ethernet chips, multi-core processor, CSI development and was responsible for the processor methodologies in a wide range of areas including timing closure, Custom design and mixed signal design. Prior to Intel, he worked at Motorola on DSP processors and led chip designs for automotive applications. He holds a B.Tech in Electronics and Communications Engineering from IT-BHU, India and has 2 design patents in the area of low power and high speed interconnect. (vshukla@cadence.com)
top related