low power system on chip based design methodology
TRANSCRIPT
EGCP-461 Low Power Digital
IC Design
Survey paper based on
Low Power System on chip based design methodology
Aakash Patel (893298174) Kashyap Patel (802501205)
Abstract
System-on-chip (SoC) is advance technology and
widely accepted and utilize by plenteous
organization. SoC comes with multiple transistor
and other electronics component on single chip.
But using smaller devices on single chip require
more computation and it cause power
consumption. Designing SoC with low power is
main concern of today’s designers. This paper
proposes various low power techniques such as
ASNoC, CAPCOM, IP reusing methodology and
methodology for medical image processor.
ASNoC design methodology with its explanation
of each step (Mapping, performance analysis,
chip floorplan, power and area analysis) is
represented. CAPCOM is explained with using
Mixed VTH cell (MVT) in critical path and
explanation of cell assign algorithm is shown. IP
reusing methodology demonstrated with IP
enhancement, IP database management and
external IP enhancement. Implementation of
these technologies with SoC and comparison of
these technologies with other low power
technologies and explained how these
methodologies are beneficial over other
technologies.
Key-words: - SoC(system-on-chip), Low-
power, Low-power design flow, ASNoC,
CAPCOM, IP reuse, temperature-aware SoC
methodology
Introduction
To reduce cost, improve performance and to
produce good product, it is necessary to have
SoC (system-on-chip) is not only implements at
function units, but also emphasizes cooperation
among function units. Here, the cooperation is
decided by on-chip communication subsystem
devices. Smaller transistor is more suitable for
on-chip communication. But if we reduce the
transistor size, then it will create some
difficulties like delay, high power
consumption. Power consumption is a great
concern for portable battery- based system on
chip (SoC) because the main source for this is
battery which can supply only a limited amount
of energy. The earlier, energy saving are done
in the design cycle so we can save more energy.
But when we move from system level to the
register transfer level, the percentage of power
saving is reduced. So it is necessary to have
some methodology which reduce delay and
power consumption. In this paper we describe
some methodology to reduce power
consumption for SoC. In first section I, we
describe how the design methodology work for
system-on-chip. In this section, first we
describe the basic structure of SoC and it work
at 90nm and 1.3v technology. Then we tell
about low power design flow at three different
level. In section II, a design methodology for
application-specific network-on-chip (ASNoC)
is described. This methodology can generate
optimized hierarchical ASNoC and a shared
memory for various applications. It is based on
floorplan to estimate power and area.
It uses 39% less power, 74% less metal area,
59% less silicon area but get double
performance compared to RAW network. In
section III, we describe critical-path aware
power consumption optimization methodology
(CAPCOM) using mixed Vth cells. It provides
an effective power saving for a low volt and
low power SoC design with 16-bit multiplier
circuit with 3811 logic cells using 90nm and 1V
CMOS technology. Here, we reduce power
consumption up to 44.9%. In section IV, we
describe reuse of Intellectual property (IP) to
improve system-on-chip design methodology
and design methodology for low power,
temperature-aware SoC for medical image
processor. In this section, IP is enhanced for
meeting the desired goal like power reduction,
high performance and high speed. It requires to
integrate three mission which are described. In
temperature-aware SoC methodology, dynamic
thermal management and dynamic voltage
scaling techniques are included. In section V,
we describe future work for this methodology.
And in section VI, we put our conclusion for
this paper.
I-The Design Methodology of Low Power
SoC
The size of the transistor is continues
decreasing but the current flows even in steady
mode, which effect the battery power and it
effect the performance. To solve this problem,
many low power techniques are developed in
different level which are (1) At system level: -
dynamic voltage scaling, bus encoding,
memory optimization (2) At algorithm level: -
modifying computational structure (3) RTL
level: - glitch minimization, clock gating and
resource sharing optimization (4) At gate level:
- gate sizing, signal to pin assignment (5) At
circuit level: - transistor sizing.
These techniques require new methodology
so we develop a new low power design flow
which is described below. We use on-chip
power management module and low power
embedded memories to design SoC.
Fig-1 Basic structure of SoC (Hu Jian)
Fig-1 describes the basic structure of SoC. SoC
is a one type of GSM which contain all analog
and digital functionalities of a cellular phone. It
is designed as a single chip solution. It mixes
the mixed and the digital signal in 90nm and
1.35V technology. In steady clock mode, SoC
allow software to attach with various
processing units so they automatically adjust
the low power for that applications. It has three
power save mode. (1) Ideal mode: - here ARM
is in low power state. All internal clocks are
stopped until any request or debug request
occurs. When it occurs it goas to Run mode. (2)
Run mode: - here the system is in operational
mode, all clocks are enabled which are
controlled by software. (3) Sleep mode: - in this
mode, the clocks will be stopped if they react
on the sleep signal. Fig-2 describes the basic
working mode of SoC chip. Here, standby
mode and streaming mode are used.
Fig-2 Basic working mode of SoC chip (Hu
Jian)
In SoC chip, multi-Vt devices are used. The
core one is regulat-Vt and Low-Vt. They have
better performance but they have high leakage
current. Foe low power applications, low
leakage device is used. The main purpose of
this devices are to increase the gate oxide
thickness, so we can reduce the gate leakage
current. Multi-Vt device construct the whole
circuit with low leakage devices but use only
some regular and low-Vt on the critical paths.
The low- power design flow for SoC is
described in fig-3. It has three level. In first
level which is system-level design, explore
architecture and algorithms for power
efficiency and evaluate power consumption for
different operational modes. Based on all these,
generate the power, performance and area. The
second stage is RTL design. Here, generated
RTL is used to match system-level model. In
this mode, analyzing and optimizing of power
is done at module level and chip level and also
check power with budget with various modes.
When RTL code and test bench are available,
the power analysis can be done. Here, the clock
tree will be generated internally. If we want to
conform the result, we have to go at third stage
which is implementation and it is necessary to
make a second power analysis after synthesis.
After place and route, the reliability power
analysis should be done at final stage.
Fig-3 Low power design flow (Hu Jian)
For power reduction, follow below
possibilities: -
Summarize RTL design for clock
gating cell.
During synthesis, use automatic clock
getting in elaboration step. Here, the
power compiler insert clock gating
cells for registers which are coded in
the RTL source with enable signals.
Here, it is necessary to exist functional test
bench which describes the behavior of the
design. Switching activities play main role to
analyses whole design.
Fig-4 describes which part if we change then
we can reduce more power. If we change
algorithms and architecture, then we can reduce
more power. It means if you want to change in
your design because of power consumption
then do not do this, just change your algorithm
and/or architecture.
Fig-4 Potential for power reductions (Hu Jian)
After the synthesis step, the estimated power
should meet the power necessary. To check the
power of the chip, it is not necessary to do it
with the functional test bench but also do it with
the worst case power consumption, the reset
behavior, the scan pattern re-simulation. But to
be sure that the chip is still working after test or
not because in this step it may have the highest
power consumption.
II-Application specific network on chip
designing for low power SOCs
Designing of System-on-chip (SoC) is very
important and using application specific
network on chip (ASNoC) method is beneficial
to gain improvement in performance, reduction
cost and power consumption. There are
following challenges rise with using SoC
applications, computation and delay.
Computation problem occurs when smaller
transistors used on SoC for multiple
functionality purpose and require more
functionality and communication with other
parts of system. While more communication
creates difficulty with time management of
every individual part of system and it causes
delay. To overcome these problems.
Application specific network on chip (ASNoC)
methodology was designed. Initially, switching
and routing technology based NoC methods are
introduced. Generally, embedded systems are
variant with system, and it makes difficult to
computation between different parts which are
also different in size. For example,
Microprocessors require more bandwidth with
compare to USB based device. Here, ASNoC
design methodology makes easier and
automate communication in system,
Methodology
Here, general design methodology of
ASNoC is shown in figure. ASNoC combines
the communication and protocol for system
with different requirement of different parts.
This methodology follows these steps which
are mentioned in figure. Overall, it starts with
mapping of architecture and behavior model of
system. Then performance analysis done based
on floorplan estimation and finally power and
area are checked based on switch and link
design, ASNoC methodology done by going
and performing all steps
Fig-5 describes the design methodology flow
for ASNoC. Here, we discus each step of
ASNoC methodology in detail.
Fig-5 Design methodology ASNoC (Design
ASNoC for Low-Power SoCs)
Mapping
ASNoC methodology has main two inputs;
computation architecture and behavior model.
Behavior model determines the functionality of
application by using programming language C
and SystemC. Computation architecture shows
computational nodes and their connection and
we can find the functionality and
interconnection with using this information.
Fig-6 Behavior model and computation
architecture (Design ASNoC for Low-
Power SoCs)
In the above figure, example of behavior model
with computation architecture is shown. This
methodology can be used to design any SoC
based application. Here, in decoder model,
behavior model is used for utilization of
functionality and this model is modeled and
distributed in computation architecture system.
Input video stream and frames are adjusted by
input/output agents. There are five processors
are shown in above example; P0, P1, P2, P3,
P4. P0 and P1 is used for implementation of
entropy decoding. P2 processor is used for
implementation of intra-frame prediction,
motion compensation prediction and inverse
transform. Deblocking filter stage is
implemented by using two processors P3 and
P4.
Communication Analysis
Behavior model is simulated and use for
generate a schematic pattern for computational
nodes. Depend on availability of abstraction of
nodes, we get detail of traffic and other detail
of specific node. If higher level abstraction is
available, then it helpful to get more specific
details. This pattern or detail comes with
different type of information such as
frequencies, sizes and traces. There is a
communication trace is available for every
computational node. Every trace has multiple
entries, these entries collects information of gap
between previous and current network access,
address of memory, size of information. Gap
between network access is used as unite of
clock cycle and it easier to describe
communication between every node. If
blocking occurs, then it affects network access
time and also affects to all other network
accesses. These all information helps to design
ASNoC.
Fig-7 Communication graph (Design ASNoC
for Low-Power SoCs)
General example of communication graph is
shown in above figure. I and O describes as
input/output agents and P0 to P4 are processors.
Traces helps to determine average
communication traffic by providing necessary
information. Above figure is result of
communication analysis.
Statistical trace modeling
In trace modeling, collected trace are
modeled and probability distributions are used
for gaining communication behavior of every
node. This can play important role in
performance analysis. Designing of hardware
and software is also become easier while using
statistical analysis. And also useful to predict
behavior of similar system.
ASNoC architecture and protocol design
Communication graph from communication
analysis and detail of trace and entries can help
to design ASNoC architecture and protocol.
Regenerating method is used to generate
hierarchy of ASNoC. Memory is arranged and
distributed as per requirement of system. Every
node has local memory that stay till end of that
step. Number of hierarchy is depending of
number of nodes. Locality communication is
maximizing by using multiple network level
and their network hierarchies. ASNoC is
designed by going through following steps.
1) Determine all sets Pn = {G1, G2,..…., Gn}
Where G indicates communication graph
of each node and P is set of all these graphs.
2) Represent P in term of cost in network and
find cost between local networks and it is
denoted by β
3) Now gathered multiple partitions which
have less costs.
4) Connect nodes to graph G with using
switch and join graph by using links.
5) Generate a shared memory by combining
all local memories.
For ASNoC architecture, numbers of
computational nodes are defined by order of
graphs and maximum number of nodes is M
which is maximum number of switches in
system.
Fig-8 One partition and corresponding NoC
architecture (Design ASNoC for Low-Power
SoCs)
Above figure shows, one part of NoC
architecture. NoC is combine network of
library component which are predesigned. By
combining all local memory in NoC creates
distributed library. While NoC base memory
has higher communication cost than memory
designed with computational architecture.
Floorplan Estimation
Chip floorplan provides information about
length of link which helps to determine delay
of that particular clock cycle of NoC. In below
figure chip floorplan is shown.
Fig-9 chip floor plan (Design ASNoC for
Low-Power SoCs)
Each link is related with area and power of
system so length of link determines best
suitable area and power of NoC.
Performance Analysis
This step identifies the performance of
application not the network. It determines the
execution or decoding time of one frame or
number of frames decoded per second and
perform simulation. For industrial purpose
OPNET is used for this step as a simulator and
it is better than other simulator with
consideration of speed and stability. OPNET
use C or C++ language for modeling and done
simulation with using statistical traces. OPNET
provides detailed information of performance
of each node of NoC design. Researchers
compared ASNoC with RAW for performance
analysis of SoC application and they achieved
that ASNoC complete execution by 50 percent
less time than Raw.
Switch and link design
A library with general information is design
in this step and provides detail of every network
at any time of system operation. It also gives
information about power, area and cycle
accuracy of every component and includes
information of switches, links and other
implementations.
Power and area analysis
Library provides information about power
for every activity of network. Power
information is collected in library generating
step and it is deriving from SPICE simulation
also. OPNET captures every activity which are
performed in network and after adding all
power from each activity we get finally, power
of the NoC. This library also provides
information about silicon and metal area for
each component and after summing this we get
total area of NoC.
Many researchers had compared of ASNoC
with NoC and RAW methodology for some
SoC design applications and results shows
ASNoC is better than other methodology. It
uses 39 percent less power, 59 percent less
silicon area and 74 percent less metal are. So,
ASNoC design methodology is very useful for
SoC applications.
III-Critical-Path Aware Power
Consumption Optimization
Methodology(CAPCOM) for low power
SoC design
CAPCOM using mixed VTH (MVT) cells
gives more advantages in terms of gaining less
power and less voltage utilization. With using
MVTCMOS technique, assigning critical path
to low-VTH (LVT) devices and non-critical path
to high-VTH (HVT) devices reduce power
consumption of SoC and it is not affects speed
and performance of system. Commonly, in
MVTCMOS techniques, each cell has used
LVT or HVT in circuits. For purpose of low
power consumption, MVT is used with LVT or
HVT devices. However, it increases
complexity of system. CAPCOM used with
mixed- VTH cell with unassigned combination
of LVT/HVT cells and MVT cells are
implemented in system with specific design
path to get reduction in power in consumption.
In this method, a cell assignment method is
interfaced with using TCL language in analysis
software.
Fig-10 relation between cell and critical path
in circuit design (Designs)
Here, in above figure relation between cells
and critical paths in circuit design is shown. It
is shown in above figure, that by weighted
sensitive critical path is allocated to LVT, HVT
or MVT cells so it gives power consumption
reduction in given circuit. Each cell in circuit
over overall circuit timing is depend on
particular circuit designing method. In above
figure, U1 cell has three critical paths and U2
cell has only one critical path. Now, if we apply
LVT/MVT/MVT methodology in U1 cells then
it changes timing of overall circuit timing. So it
shows that U1 cell is more important than U2
cell. In MVTCMOS methodology, generally,
sensitivity is used for determination of
swapping cells between LVT/MVT/HVT in
circuit.
If there is two different cells are available for
ith cell in circuit. Let’s assume, if those two
cells are LVT and HVT then following formula
is useful to determination of sensitivity.
Si = (Pi – Pi’) / (D’ri + D’fi – Dri - Dfi)
Here, P and P’ are average power
consumption pf ith cell observed in two cells;
and Dr/fi and Dr/fi’ are propagation delay of
output. By implementing values in above
equation we can get easily sensitivity for two
cells. This equation is modified with critical
path sensitivity and called as critical path
weighted sensitivity and it’s equation is
following.
CPWSi = (Pi – Pi’) /(Ci) (D’ri + D’fi – Dri - Dfi)
Ci is number of critical path passing to ith cell.
The main difference between CPWSi and Si is
Fig-11 cell assignment algorithm flow chart
(Designs)
that in equation of CPWSi is multiplied with
(1/Ci) to find timing affection of each cell to
overall circuit.
Fig-12 shows, main cell allocation algorithm
of MVTCMOS methodology. Firstly, this starts
with circuit design of all LVT cells and then
calculation of Si/CPWSi LVT to HVT cells and
it is used for first and second stage
optimization. Now LVT to MVT Si/CPWSi is
calculated and then preceding into 3rd and 4th
optimization stage. In every optimization stage,
LVT with largest Si/CPWSi is selected. CPWSi
is used for selecting cells in critical path and it
cause problem in selecting non critical path
because it is always zero and results in
unbalance in circuit so Si is used for selection
of cells. After selection of LVT cell with largest
Si/CPWSi, now algorithm goes with MVT and
HVT cell and swap with other
Fig-12 optimization of each cell (Designs)
cell and check the propagation delay of each
cell and compare with overall system
performance. If delay found? Yes then
algorithm stores this move and continue this
process with LVT and 2nd largest cell with
Si/CPWSi and if no then cell is restored back
with LVT and program continue with 2nd
largest cell with Si/CPWSi. Then next step is
that swapping with HVT/MVT cell. Then
circuit’s delay is determined and decided that
cell is return back to LVT or not. This
procedure continues with 3rd largest cell and it
continue through entire circuit is analyzed.
Researchers had analyzed and compared
CAPCOM with other methodology and
observed results of that comparison. CAPCOM
is implemented in 16-bit multiplier circuit and
initially, network library with different
configuration of LVT, MVT, HVT is used and
circuit is tested. Then other MVTCMOS
technology such as GDSPMOS and CBLPRP
is used and implemented in 16-bit multiplier
circuit and results are observed. Following
figure is graph of power reduction in above
mentioned MVTCMOS technology. It
indicates that CAPCOM has 44.9 percent
higher power reduction compared with other
technologies.
Fig-13 Power reduction optimize by
GDSPOM,CBLPRP,CAPCOM (Designs)
IV-reuse of Intellectual property (IP) and Temperature-Aware SoC for Medical
Image Processors
In order to design complex multi gate SoC,
designer use reuse of IP blocks to meet the
challenges of performance and power
consumption. But IP blocks are pre-verified
and pre-designed, the designer can concentrate
on the system level. However, in practice IP
reuse has rarely been as beneficial. This is
because of issues surrounding IP quality which
made it ironically.
To meet the SoC design goal, it is necessary
to modify some IP before it can be reused in
context. But it is too hard for IP sourcing team
because unviability of sources. In this time, we
can enhance the IPs and validating the changes
and releasing the IPs several times during the
SoC design phase.
Here we list a few situations where in fact
the inherited IPs needed to be enhanced. The
most frequent changes were related to
increasing the test coverage. This is required as
these IPs were inherited from resources where
it had not been important to meet low DPPM.
The enhancement uses one of the following
categories:
(1) In the inherited IPs, some flip-flops are not
testable. This lead to lower coverage. So it
requires to add a clock multiplexer to clock
these flip-flop in test mode. Here we reduce
power consumption up to 15%.
Fig-14 test clock multiplexing (Soujanna
Sarkar)
Here, we used the functional clock domains
for test clock multiplexing based on the clock
frequency so the test clock frequency is less
than or equal to the clock frequency.
(2) It is necessary to add logic to disable
asynchronous reset in test mode and to enable
scan shift. Here, we can reduce power
consumption up to 18-20%.
Fig-15 Disable asynchronous reset (Soujanna
Sarkar)
(3) flip-flops are converted into positive edge
triggered flops which already are in negative
edge triggered in functional mode. Here, we
can enable BIST implementation.
Fig-16 clock edge conversion (Soujanna
Sarkar)
(4) Some IPs have less memory that do not have
BIST capability. So it is necessary to add
memory instances.
Table-1 shows list of some modules with
coverage when it is in original and it after
changes.
In SoC which are used in car radio
applications, different radio peripherals are
configured and controlled via the I2C protocol.
Table-1 (Soujanna Sarkar)
High power density leads to on-chip hot
spots which is causing thermal hazards for the
system. This problem is solved with low-cost
low-power thermal packaging often found in
SoC. So temperature is main focus in system
design.
Dynamic voltage scaling(DVS) reduces
energy as much possible without taking time.
Fig-17 shows the block diagram using SRAD
algorithm. In SRAD, the time constrains are set
up by the output display frame rate. Here, the
work-load is proportional to the iteration
Fig-17 Block diagram of the SoC design with
SRAD algorithm
number which is needed in SRAD to achieve
specific S/N ratio. We can compare most
significant bits of any number of the current
pixel in each iteration until they converge.
Here, we need small window of the image to
be monitored. The DVS implementation for
SRAD takes advantage of the observation that
ultrasound medical images change very slowly
and gradual. Here, it is compared with other
imaging systems like movie, thus the same
version number can be used for a number of
support frames until a new amount is needed
depending on data convergence. A look-up
table is calculated in advance from
simulations for different iteration counts to
choose corresponding VDD and time clock
frequency.
Whenever an application gains the heat
removal capability. The DTM techniques are
invoked until the temperature reach at desire
level. The DTM techniques include clock
gating, dynamic voltage and frequency scaling
(DVFS).
V-Future Directions
There are many other low power technologies
need to research and experimented in purpose
to utilization in low power application. These
methodologies are also need to implement
more functionality and experimented in order
to extend their level.
VI- Conclusions
Various low power technologies are presented
in this paper. They are implemented with SoC
based system and analyzed. All technology
comes with different advantages. (1) design
methodology for low power SoC is explained
with information about basic structure of SoC
and low power design flow. (2) ASNoC
methodology is presented in this paper with its
description of every step. (3) CAPCOM
technology is demonstrated with its description
of algorithm and comparison with other similar
technologies is shown. (4) IP reusing
methodology using on SoC and its database
structure and IP management and enhancement
is presented
References
1) Zhenyu(Jerry) Qi, Wei Huang, Adam
Cabe, Wenqian Wu, Yan Zhang, Garret
Rose, Mircea R. Stan “A Design
Methodology for a Low-Power,
Temperature-Aware SoC Developed for
Medical Image Processors”
2) Jan M. Rabaey, “Low power design
methodology and design flow”
3) Jiang Xu, Wei Zhang, Kwai Hung Mo1,
Zili Shao “Design ASNoC for Low-Power
SoCs”
4) G. J. Y. Lin and C. B. Hsu “Critical-Path
Aware Power Consumption Optimization
Methodology (CAPCOM) Using Mixed-
VTH Cells for Low-Power SOC Designs”
Dept. of Electrical Engineering, National
Taiwan University
5) Soujanna Sarkar, Sanjay Shinde, Subash
Chandar G. “An Effective IP Reuse
Methodology for Quality System-on-Chip
Design”
6) Hu Jian, Shen Xubang, “The Design
Methodology and Practice of Low Power
SoC” School of Computer Science and
Technology, NorthWestern Polytechnical
University
7) Fahad Bin Muslim, Affaq Qamar, Luciano
Lavagno “Low Power Methodology for an
ASIC design flow based on High-Level
Synthesis” Department of Electronics and
Telecommunication Politecnico di Torino,
ITALY
8) David Ell´eouet, Nathalie Julien,
Dominique Houzet1 “A High Level SoC
Power Estimation Based on IP Modeling”
9) B. Chung* and J. B. Kuo “Gate-Level
Dual-Threshold Static Power Optimization
Methodology (GDSPOM) for Designing
High-Speed Low-Power SOC Applications
Using 90nm MTCMOS Technology”
10) Sunghyun Lee, Sungjoo Yoo, Kiyoung
Choi “An Intra-Task Dynamic Voltage
Scaling Method for SoC Design with
Hierarchical FSM and Synchronous
Dataflow Model”