low power system on chip based design methodology

EGCP-461 Low Power Digital

IC Design

Survey paper based on

Low Power System on chip based design methodology

Aakash Patel (893298174) Kashyap Patel (802501205)

Abstract

System-on-chip (SoC) is advance technology and

widely accepted and utilize by plenteous

organization. SoC comes with multiple transistor

and other electronics component on single chip.

But using smaller devices on single chip require

more computation and it cause power

consumption. Designing SoC with low power is

main concern of today’s designers. This paper

proposes various low power techniques such as

ASNoC, CAPCOM, IP reusing methodology and

methodology for medical image processor.

ASNoC design methodology with its explanation

of each step (Mapping, performance analysis,

chip floorplan, power and area analysis) is

represented. CAPCOM is explained with using

Mixed VTH cell (MVT) in critical path and

explanation of cell assign algorithm is shown. IP

reusing methodology demonstrated with IP

enhancement, IP database management and

external IP enhancement. Implementation of

these technologies with SoC and comparison of

these technologies with other low power

technologies and explained how these

methodologies are beneficial over other

technologies.

Key-words: - SoC(system-on-chip), Low-

power, Low-power design flow, ASNoC,

CAPCOM, IP reuse, temperature-aware SoC

methodology

Introduction

To reduce cost, improve performance and to

produce good product, it is necessary to have

SoC (system-on-chip) is not only implements at

function units, but also emphasizes cooperation

among function units. Here, the cooperation is

decided by on-chip communication subsystem

devices. Smaller transistor is more suitable for

on-chip communication. But if we reduce the

transistor size, then it will create some

difficulties like delay, high power

consumption. Power consumption is a great

concern for portable battery- based system on

chip (SoC) because the main source for this is

battery which can supply only a limited amount

of energy. The earlier, energy saving are done

in the design cycle so we can save more energy.

But when we move from system level to the

register transfer level, the percentage of power

saving is reduced. So it is necessary to have

some methodology which reduce delay and

power consumption. In this paper we describe

some methodology to reduce power

consumption for SoC. In first section I, we

describe how the design methodology work for

system-on-chip. In this section, first we

describe the basic structure of SoC and it work

at 90nm and 1.3v technology. Then we tell

about low power design flow at three different

level. In section II, a design methodology for

application-specific network-on-chip (ASNoC)

is described. This methodology can generate

optimized hierarchical ASNoC and a shared

memory for various applications. It is based on

floorplan to estimate power and area.

It uses 39% less power, 74% less metal area,

59% less silicon area but get double

performance compared to RAW network. In

section III, we describe critical-path aware

power consumption optimization methodology

(CAPCOM) using mixed Vth cells. It provides

an effective power saving for a low volt and

low power SoC design with 16-bit multiplier

circuit with 3811 logic cells using 90nm and 1V

CMOS technology. Here, we reduce power

consumption up to 44.9%. In section IV, we

describe reuse of Intellectual property (IP) to

improve system-on-chip design methodology

and design methodology for low power,

temperature-aware SoC for medical image

processor. In this section, IP is enhanced for

meeting the desired goal like power reduction,

high performance and high speed. It requires to

integrate three mission which are described. In

temperature-aware SoC methodology, dynamic

thermal management and dynamic voltage

scaling techniques are included. In section V,

we describe future work for this methodology.

And in section VI, we put our conclusion for

this paper.

I-The Design Methodology of Low Power

SoC

The size of the transistor is continues

decreasing but the current flows even in steady

mode, which effect the battery power and it

effect the performance. To solve this problem,

many low power techniques are developed in

different level which are (1) At system level: -

dynamic voltage scaling, bus encoding,

memory optimization (2) At algorithm level: -

modifying computational structure (3) RTL

level: - glitch minimization, clock gating and

resource sharing optimization (4) At gate level:

- gate sizing, signal to pin assignment (5) At

circuit level: - transistor sizing.

These techniques require new methodology

so we develop a new low power design flow

which is described below. We use on-chip

power management module and low power

embedded memories to design SoC.

Fig-1 Basic structure of SoC (Hu Jian)

Fig-1 describes the basic structure of SoC. SoC

is a one type of GSM which contain all analog

and digital functionalities of a cellular phone. It

is designed as a single chip solution. It mixes

the mixed and the digital signal in 90nm and

1.35V technology. In steady clock mode, SoC

allow software to attach with various

processing units so they automatically adjust

the low power for that applications. It has three

power save mode. (1) Ideal mode: - here ARM

is in low power state. All internal clocks are

stopped until any request or debug request

occurs. When it occurs it goas to Run mode. (2)

Run mode: - here the system is in operational

mode, all clocks are enabled which are

controlled by software. (3) Sleep mode: - in this

mode, the clocks will be stopped if they react

on the sleep signal. Fig-2 describes the basic

working mode of SoC chip. Here, standby

mode and streaming mode are used.

Fig-2 Basic working mode of SoC chip (Hu

Jian)

In SoC chip, multi-Vt devices are used. The

core one is regulat-Vt and Low-Vt. They have

better performance but they have high leakage

current. Foe low power applications, low

leakage device is used. The main purpose of

this devices are to increase the gate oxide

thickness, so we can reduce the gate leakage

current. Multi-Vt device construct the whole

circuit with low leakage devices but use only

some regular and low-Vt on the critical paths.

The low- power design flow for SoC is

described in fig-3. It has three level. In first

level which is system-level design, explore

architecture and algorithms for power

efficiency and evaluate power consumption for

different operational modes. Based on all these,

generate the power, performance and area. The

second stage is RTL design. Here, generated

RTL is used to match system-level model. In

this mode, analyzing and optimizing of power

is done at module level and chip level and also

check power with budget with various modes.

When RTL code and test bench are available,

the power analysis can be done. Here, the clock

tree will be generated internally. If we want to

conform the result, we have to go at third stage

which is implementation and it is necessary to

make a second power analysis after synthesis.

After place and route, the reliability power

analysis should be done at final stage.

Fig-3 Low power design flow (Hu Jian)

For power reduction, follow below

possibilities: -

Summarize RTL design for clock

gating cell.

During synthesis, use automatic clock

getting in elaboration step. Here, the

power compiler insert clock gating

cells for registers which are coded in

the RTL source with enable signals.

Here, it is necessary to exist functional test

bench which describes the behavior of the

design. Switching activities play main role to

analyses whole design.

Fig-4 describes which part if we change then

we can reduce more power. If we change

algorithms and architecture, then we can reduce

more power. It means if you want to change in

your design because of power consumption

then do not do this, just change your algorithm

and/or architecture.

Fig-4 Potential for power reductions (Hu Jian)

After the synthesis step, the estimated power

should meet the power necessary. To check the

power of the chip, it is not necessary to do it

with the functional test bench but also do it with

the worst case power consumption, the reset

behavior, the scan pattern re-simulation. But to

be sure that the chip is still working after test or

not because in this step it may have the highest

power consumption.

II-Application specific network on chip

designing for low power SOCs

Designing of System-on-chip (SoC) is very

important and using application specific

network on chip (ASNoC) method is beneficial

to gain improvement in performance, reduction

cost and power consumption. There are

following challenges rise with using SoC

applications, computation and delay.

Computation problem occurs when smaller

transistors used on SoC for multiple

functionality purpose and require more

functionality and communication with other

parts of system. While more communication

creates difficulty with time management of

every individual part of system and it causes

delay. To overcome these problems.

Application specific network on chip (ASNoC)

methodology was designed. Initially, switching

and routing technology based NoC methods are

introduced. Generally, embedded systems are

variant with system, and it makes difficult to

computation between different parts which are

also different in size. For example,

Microprocessors require more bandwidth with

compare to USB based device. Here, ASNoC

design methodology makes easier and

automate communication in system,

Methodology

Here, general design methodology of

ASNoC is shown in figure. ASNoC combines

the communication and protocol for system

with different requirement of different parts.

This methodology follows these steps which

are mentioned in figure. Overall, it starts with

mapping of architecture and behavior model of

system. Then performance analysis done based

on floorplan estimation and finally power and

area are checked based on switch and link

design, ASNoC methodology done by going

and performing all steps

Fig-5 describes the design methodology flow

for ASNoC. Here, we discus each step of

ASNoC methodology in detail.

Fig-5 Design methodology ASNoC (Design

ASNoC for Low-Power SoCs)

Mapping

ASNoC methodology has main two inputs;

computation architecture and behavior model.

Behavior model determines the functionality of

application by using programming language C

and SystemC. Computation architecture shows

computational nodes and their connection and

we can find the functionality and

interconnection with using this information.

Fig-6 Behavior model and computation

architecture (Design ASNoC for Low-

Power SoCs)

In the above figure, example of behavior model

with computation architecture is shown. This

methodology can be used to design any SoC

based application. Here, in decoder model,

behavior model is used for utilization of

functionality and this model is modeled and

distributed in computation architecture system.

Input video stream and frames are adjusted by

input/output agents. There are five processors

are shown in above example; P0, P1, P2, P3,

P4. P0 and P1 is used for implementation of

entropy decoding. P2 processor is used for

implementation of intra-frame prediction,

motion compensation prediction and inverse

transform. Deblocking filter stage is

implemented by using two processors P3 and

P4.

Communication Analysis

Behavior model is simulated and use for

generate a schematic pattern for computational

nodes. Depend on availability of abstraction of

nodes, we get detail of traffic and other detail

of specific node. If higher level abstraction is

available, then it helpful to get more specific

details. This pattern or detail comes with

different type of information such as

frequencies, sizes and traces. There is a

communication trace is available for every

computational node. Every trace has multiple

entries, these entries collects information of gap

between previous and current network access,

address of memory, size of information. Gap

between network access is used as unite of

clock cycle and it easier to describe

communication between every node. If

blocking occurs, then it affects network access

time and also affects to all other network

accesses. These all information helps to design

ASNoC.

Fig-7 Communication graph (Design ASNoC

for Low-Power SoCs)

General example of communication graph is

shown in above figure. I and O describes as

input/output agents and P0 to P4 are processors.

Traces helps to determine average

communication traffic by providing necessary

information. Above figure is result of

communication analysis.

Statistical trace modeling

In trace modeling, collected trace are

modeled and probability distributions are used

for gaining communication behavior of every

node. This can play important role in

performance analysis. Designing of hardware

and software is also become easier while using

statistical analysis. And also useful to predict

behavior of similar system.

ASNoC architecture and protocol design

Communication graph from communication

analysis and detail of trace and entries can help

to design ASNoC architecture and protocol.

Regenerating method is used to generate

hierarchy of ASNoC. Memory is arranged and

distributed as per requirement of system. Every

node has local memory that stay till end of that

step. Number of hierarchy is depending of

number of nodes. Locality communication is

maximizing by using multiple network level

and their network hierarchies. ASNoC is

designed by going through following steps.

1) Determine all sets Pn = {G1, G2,..…., Gn}

Where G indicates communication graph

of each node and P is set of all these graphs.

2) Represent P in term of cost in network and

find cost between local networks and it is

denoted by β

3) Now gathered multiple partitions which

have less costs.

4) Connect nodes to graph G with using

switch and join graph by using links.

5) Generate a shared memory by combining

all local memories.

For ASNoC architecture, numbers of

computational nodes are defined by order of

graphs and maximum number of nodes is M

which is maximum number of switches in

system.

Fig-8 One partition and corresponding NoC

architecture (Design ASNoC for Low-Power

SoCs)

Above figure shows, one part of NoC

architecture. NoC is combine network of

library component which are predesigned. By

combining all local memory in NoC creates

distributed library. While NoC base memory

has higher communication cost than memory

designed with computational architecture.

Floorplan Estimation

Chip floorplan provides information about

length of link which helps to determine delay

of that particular clock cycle of NoC. In below

figure chip floorplan is shown.

Fig-9 chip floor plan (Design ASNoC for

Low-Power SoCs)

Each link is related with area and power of

system so length of link determines best

suitable area and power of NoC.

Performance Analysis

This step identifies the performance of

application not the network. It determines the

execution or decoding time of one frame or

number of frames decoded per second and

perform simulation. For industrial purpose

OPNET is used for this step as a simulator and

it is better than other simulator with

consideration of speed and stability. OPNET

use C or C++ language for modeling and done

simulation with using statistical traces. OPNET

provides detailed information of performance

of each node of NoC design. Researchers

compared ASNoC with RAW for performance

analysis of SoC application and they achieved

that ASNoC complete execution by 50 percent

less time than Raw.

Switch and link design

A library with general information is design

in this step and provides detail of every network

at any time of system operation. It also gives

information about power, area and cycle

accuracy of every component and includes

information of switches, links and other

implementations.

Power and area analysis

Library provides information about power

for every activity of network. Power

information is collected in library generating

step and it is deriving from SPICE simulation

also. OPNET captures every activity which are

performed in network and after adding all

power from each activity we get finally, power

of the NoC. This library also provides

information about silicon and metal area for

each component and after summing this we get

total area of NoC.

Many researchers had compared of ASNoC

with NoC and RAW methodology for some

SoC design applications and results shows

ASNoC is better than other methodology. It

uses 39 percent less power, 59 percent less

silicon area and 74 percent less metal are. So,

ASNoC design methodology is very useful for

SoC applications.

III-Critical-Path Aware Power

Consumption Optimization

Methodology(CAPCOM) for low power

SoC design

CAPCOM using mixed VTH (MVT) cells

gives more advantages in terms of gaining less

power and less voltage utilization. With using

MVTCMOS technique, assigning critical path

to low-VTH (LVT) devices and non-critical path

to high-VTH (HVT) devices reduce power

consumption of SoC and it is not affects speed

and performance of system. Commonly, in

MVTCMOS techniques, each cell has used

LVT or HVT in circuits. For purpose of low

power consumption, MVT is used with LVT or

HVT devices. However, it increases

complexity of system. CAPCOM used with

mixed- VTH cell with unassigned combination

of LVT/HVT cells and MVT cells are

implemented in system with specific design

path to get reduction in power in consumption.

In this method, a cell assignment method is

interfaced with using TCL language in analysis

software.

Fig-10 relation between cell and critical path

in circuit design (Designs)

Here, in above figure relation between cells

and critical paths in circuit design is shown. It

is shown in above figure, that by weighted

sensitive critical path is allocated to LVT, HVT

or MVT cells so it gives power consumption

reduction in given circuit. Each cell in circuit

over overall circuit timing is depend on

particular circuit designing method. In above

figure, U1 cell has three critical paths and U2

cell has only one critical path. Now, if we apply

LVT/MVT/MVT methodology in U1 cells then

it changes timing of overall circuit timing. So it

shows that U1 cell is more important than U2

cell. In MVTCMOS methodology, generally,

sensitivity is used for determination of

swapping cells between LVT/MVT/HVT in

circuit.

If there is two different cells are available for

ith cell in circuit. Let’s assume, if those two

cells are LVT and HVT then following formula

is useful to determination of sensitivity.

Si = (Pi – Pi’) / (D’ri + D’fi – Dri - Dfi)

Here, P and P’ are average power

consumption pf ith cell observed in two cells;

and Dr/fi and Dr/fi’ are propagation delay of

output. By implementing values in above

equation we can get easily sensitivity for two

cells. This equation is modified with critical

path sensitivity and called as critical path

weighted sensitivity and it’s equation is

following.

CPWSi = (Pi – Pi’) /(Ci) (D’ri + D’fi – Dri - Dfi)

Ci is number of critical path passing to ith cell.

The main difference between CPWSi and Si is

Fig-11 cell assignment algorithm flow chart

(Designs)

that in equation of CPWSi is multiplied with

(1/Ci) to find timing affection of each cell to

overall circuit.

Fig-12 shows, main cell allocation algorithm

of MVTCMOS methodology. Firstly, this starts

with circuit design of all LVT cells and then

calculation of Si/CPWSi LVT to HVT cells and

it is used for first and second stage

optimization. Now LVT to MVT Si/CPWSi is

calculated and then preceding into 3rd and 4th

optimization stage. In every optimization stage,

LVT with largest Si/CPWSi is selected. CPWSi

is used for selecting cells in critical path and it

cause problem in selecting non critical path

because it is always zero and results in

unbalance in circuit so Si is used for selection

of cells. After selection of LVT cell with largest

Si/CPWSi, now algorithm goes with MVT and

HVT cell and swap with other

Fig-12 optimization of each cell (Designs)

cell and check the propagation delay of each

cell and compare with overall system

performance. If delay found? Yes then

algorithm stores this move and continue this

process with LVT and 2nd largest cell with

Si/CPWSi and if no then cell is restored back

with LVT and program continue with 2nd

largest cell with Si/CPWSi. Then next step is

that swapping with HVT/MVT cell. Then

circuit’s delay is determined and decided that

cell is return back to LVT or not. This

procedure continues with 3rd largest cell and it

continue through entire circuit is analyzed.

Researchers had analyzed and compared

CAPCOM with other methodology and

observed results of that comparison. CAPCOM

is implemented in 16-bit multiplier circuit and

initially, network library with different

configuration of LVT, MVT, HVT is used and

circuit is tested. Then other MVTCMOS

technology such as GDSPMOS and CBLPRP

is used and implemented in 16-bit multiplier

circuit and results are observed. Following

figure is graph of power reduction in above

mentioned MVTCMOS technology. It

indicates that CAPCOM has 44.9 percent

higher power reduction compared with other

technologies.

Fig-13 Power reduction optimize by

GDSPOM,CBLPRP,CAPCOM (Designs)

IV-reuse of Intellectual property (IP) and Temperature-Aware SoC for Medical

Image Processors

In order to design complex multi gate SoC,

designer use reuse of IP blocks to meet the

challenges of performance and power

consumption. But IP blocks are pre-verified

and pre-designed, the designer can concentrate

on the system level. However, in practice IP

reuse has rarely been as beneficial. This is

because of issues surrounding IP quality which

made it ironically.

To meet the SoC design goal, it is necessary

to modify some IP before it can be reused in

context. But it is too hard for IP sourcing team

because unviability of sources. In this time, we

can enhance the IPs and validating the changes

and releasing the IPs several times during the

SoC design phase.

Here we list a few situations where in fact

the inherited IPs needed to be enhanced. The

most frequent changes were related to

increasing the test coverage. This is required as

these IPs were inherited from resources where

it had not been important to meet low DPPM.

The enhancement uses one of the following

categories:

(1) In the inherited IPs, some flip-flops are not

testable. This lead to lower coverage. So it

requires to add a clock multiplexer to clock

these flip-flop in test mode. Here we reduce

power consumption up to 15%.

Fig-14 test clock multiplexing (Soujanna

Sarkar)

Here, we used the functional clock domains

for test clock multiplexing based on the clock

frequency so the test clock frequency is less

than or equal to the clock frequency.

(2) It is necessary to add logic to disable

asynchronous reset in test mode and to enable

scan shift. Here, we can reduce power

consumption up to 18-20%.

Fig-15 Disable asynchronous reset (Soujanna

Sarkar)

(3) flip-flops are converted into positive edge

triggered flops which already are in negative

edge triggered in functional mode. Here, we

can enable BIST implementation.

Fig-16 clock edge conversion (Soujanna

Sarkar)

(4) Some IPs have less memory that do not have

BIST capability. So it is necessary to add

memory instances.

Table-1 shows list of some modules with

coverage when it is in original and it after

changes.

In SoC which are used in car radio

applications, different radio peripherals are

configured and controlled via the I2C protocol.

Table-1 (Soujanna Sarkar)

High power density leads to on-chip hot

spots which is causing thermal hazards for the

system. This problem is solved with low-cost

low-power thermal packaging often found in

SoC. So temperature is main focus in system

design.

Dynamic voltage scaling(DVS) reduces

energy as much possible without taking time.

Fig-17 shows the block diagram using SRAD

algorithm. In SRAD, the time constrains are set

up by the output display frame rate. Here, the

work-load is proportional to the iteration

Fig-17 Block diagram of the SoC design with

SRAD algorithm

number which is needed in SRAD to achieve

specific S/N ratio. We can compare most

significant bits of any number of the current

pixel in each iteration until they converge.

Here, we need small window of the image to

be monitored. The DVS implementation for

SRAD takes advantage of the observation that

ultrasound medical images change very slowly

and gradual. Here, it is compared with other

imaging systems like movie, thus the same

version number can be used for a number of

support frames until a new amount is needed

depending on data convergence. A look-up

table is calculated in advance from

simulations for different iteration counts to

choose corresponding VDD and time clock

frequency.

Whenever an application gains the heat

removal capability. The DTM techniques are

invoked until the temperature reach at desire

level. The DTM techniques include clock

gating, dynamic voltage and frequency scaling

(DVFS).

V-Future Directions

There are many other low power technologies

need to research and experimented in purpose

to utilization in low power application. These

methodologies are also need to implement

more functionality and experimented in order

to extend their level.

VI- Conclusions

Various low power technologies are presented

in this paper. They are implemented with SoC

based system and analyzed. All technology

comes with different advantages. (1) design

methodology for low power SoC is explained

with information about basic structure of SoC

and low power design flow. (2) ASNoC

methodology is presented in this paper with its

description of every step. (3) CAPCOM

technology is demonstrated with its description

of algorithm and comparison with other similar

technologies is shown. (4) IP reusing

methodology using on SoC and its database

structure and IP management and enhancement

is presented

References

1) Zhenyu(Jerry) Qi, Wei Huang, Adam

Cabe, Wenqian Wu, Yan Zhang, Garret

Rose, Mircea R. Stan “A Design

Methodology for a Low-Power,

Temperature-Aware SoC Developed for

Medical Image Processors”

2) Jan M. Rabaey, “Low power design

methodology and design flow”

3) Jiang Xu, Wei Zhang, Kwai Hung Mo1,

Zili Shao “Design ASNoC for Low-Power

SoCs”

4) G. J. Y. Lin and C. B. Hsu “Critical-Path

Aware Power Consumption Optimization

Methodology (CAPCOM) Using Mixed-

VTH Cells for Low-Power SOC Designs”

Dept. of Electrical Engineering, National

Taiwan University

5) Soujanna Sarkar, Sanjay Shinde, Subash

Chandar G. “An Effective IP Reuse

Methodology for Quality System-on-Chip

Design”

6) Hu Jian, Shen Xubang, “The Design

Methodology and Practice of Low Power

SoC” School of Computer Science and

Technology, NorthWestern Polytechnical

University

7) Fahad Bin Muslim, Affaq Qamar, Luciano

Lavagno “Low Power Methodology for an

ASIC design flow based on High-Level

Synthesis” Department of Electronics and

Telecommunication Politecnico di Torino,

ITALY

8) David Ell´eouet, Nathalie Julien,

Dominique Houzet1 “A High Level SoC

Power Estimation Based on IP Modeling”

9) B. Chung* and J. B. Kuo “Gate-Level

Dual-Threshold Static Power Optimization

Methodology (GDSPOM) for Designing

High-Speed Low-Power SOC Applications

Using 90nm MTCMOS Technology”

10) Sunghyun Lee, Sungjoo Yoo, Kiyoung

Choi “An Intra-Task Dynamic Voltage

Scaling Method for SoC Design with

Hierarchical FSM and Synchronous

Dataflow Model”

low power system on chip based design methodology

Documents