mavo: an automated framework for esl design …doemer/publications/cecs_tr_14_12.pdfmonitor,...

15
Center for Embedded and Cyber-Physical Systems University of California, Irvine MAVO: An Automated Framework for ESL Design Monitor, Analyze, Visualize and Optimize Yasaman Samei and Rainer D¨ omer Technical Report CECS-14-12 November 20, 2014 Center for Embedded and Cyber-Physical Systems University of California, Irvine Irvine, CA 92697-2620, USA (949) 824-8919 ysameisy,[email protected] http://www.cecs.uci.edu

Upload: vukhanh

Post on 11-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

Center for Embedded and Cyber-Physical SystemsUniversity of California, Irvine

MAVO: An Automated Framework for ESL DesignMonitor, Analyze, Visualize and Optimize

Yasaman Samei and Rainer Domer

Technical Report CECS-14-12November 20, 2014

Center for Embedded and Cyber-Physical SystemsUniversity of California, IrvineIrvine, CA 92697-2620, USA

(949) 824-8919

ysameisy,[email protected]://www.cecs.uci.edu

Page 2: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

MAVO: An Automated Framework for ESL DesignMonitor, Analyze, Visualize and Optimize

Yasaman Samei and Rainer Domer

Technical Report CECS-14-12November 20, 2014

Center for Embedded and Cyber-Physical SystemsUniversity of California, IrvineIrvine, CA 92697-2620, USA

(949) 824-8919

ysameisy,[email protected]://www.cecs.uci.edu

Abstract

Over the last decade, research in Electronic System Level (ESL) design has resulted in significant advancesin addressing the rising design complexity and meeting the required performance constraints. Now a majorconcern of system-level design is the power reduction and energy dissipation of the system-on-chip which notonly affects battery lifetime but also thermal aspects and reliability of the end product. Towards power-awareESL design, we present MAVO, an automated framework to Monitor, Analyze, Visualize and Optimize both powerand performance at the early stages of the design process. Using four techniques for pipeline balancing, DynamicVoltage and Frequency Scaling (DVFS), power scheduling, and peak-power reduction, we minimize and balancethe energy dissipation at the same time as addressing performance constraints. Using an image processingapplication, we demonstrate the benefits of automated powerprofiling and visualization in the MAVO frameworkresulting in a smoothed energy dissipation profile by 29% andwith 16% power reduction without performancepenalty.

Page 3: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

Contents

1 Introduction 1

2 Motivation 22.1 Design Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 22.2 Voltage and Frequency Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . 22.3 Balancing power dissipation by scheduling . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 32.4 Smoothing Power Spikes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 32.5 Power shutoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 3

3 Related works 3

4 Approach:MAVO Framework 44.1 Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 44.2 Power and Performance Annotator . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 54.3 PowerAnalyzer API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 54.4 Power Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 6

5 Case Study: Canny Edge Detector 65.1 Canny: Design Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . 65.2 Canny: Adjusting Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . 75.3 Canny: Power Aware Scheduling . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 75.4 Canny: Smoothing Power Spikes . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 7

6 Conclusion & Future Work 8

References 8

ii

Page 4: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

List of Figures

1 Evaluation of different architectures for power and performance . .. . . . . . . . . . . . . . . 22 AdjustingPE clock frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Scheduling Power Dissipation with MAVO framework . . . . . . . . . . . . . . . .. . . . . . 34 Smoothing power spikes with MAVO framework . . . . . . . . . . . . . . . . . . . .. . . . . 35 Design flow with MAVO framework . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 46 Canny Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 67 Adjusting Frequency for HW1 and HW2 using MAVO . . . . . . . . . . . . . . .. . . . . . . 78 Adjusting work period for HW1 and HW2 using MAVO . . . . . . . . . . . . . . .. . . . . . 79 Active processes power dissipation in CPU . . . . . . . . . . . . . . . . . . . .. . . . . . . . 710 Smoothing power dissipation using MAVO . . . . . . . . . . . . . . . . . . . . . . .. . . . . 811 Power dissipation of Canny Edge Detector visualized and optimized by MAVO . . . . . . . . . 9

iii

Page 5: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

List of Tables

1 The delay & average power of pipeline stages . . . . . . . . . . . . . . . . . .. . . . . . . . . 62 Timing and Performance for Canny Edge Detector after applying each technique . . . . . . . . 8

iv

Page 6: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

MAVO: An Automated Framework for ESL DesignMonitor, Analyze, Visualize and Optimize

Yasaman Samei and Rainer DomerCenter for Embedded and Cyber-Physical Systems

University of California, IrvineIrvine, CA 92697-2620, USA

ysameisy,[email protected]://www.cecs.uci.edu

Abstract

Over the last decade, research in Electronic SystemLevel (ESL) design has resulted in significant ad-vances in addressing the rising design complexity andmeeting the required performance constraints. Now amajor concern of system-level design is the power re-duction and energy dissipation of the system-on-chipwhich not only affects battery lifetime but also ther-mal aspects and reliability of the end product. To-wards power-aware ESL design, we present MAVO,an automated framework to Monitor, Analyze, Visual-ize and Optimize both power and performance at theearly stages of the design process. Using four tech-niques for pipeline balancing, Dynamic Voltage andFrequency Scaling (DVFS), power scheduling, andpeak-power reduction, we minimize and balance theenergy dissipation at the same time as addressing per-formance constraints. Using an image processing ap-plication, we demonstrate the benefits of automatedpower profiling and visualization in the MAVO frame-work resulting in a smoothed energy dissipation pro-file by 29% and with 16% power reduction withoutperformance penalty.

1 Introduction

ESL design has emerged around a decade ago and hasbeen equipped with different System Level Descrip-tion Languages (SLDL) and Electronic Design Au-tomation (EDA) tools. However, the increasing com-plexity of SoCs extended the design space and conse-

quently intensified the design productivity gap. Be-side the difficulties in managing the size of designspace, there are power, temperature and reliabilityconcerns of the design as well. All these factors, makesystem design a challenging task. The fact that poweroptimization at system level can result in significantreduction up to an order of magnitude compared topower saving at lower levels(e.g. less than 10% atgate level), reveals the necessity for a effective and ef-ficient system level power analysis [9]. System leveldesign decisions can significantly increase the relia-bility and lifetime of a system and enhance any otherpower optimization applied at lower levels later. Themain goal of evaluation at system level is to determinewhether or not the proposed design meets the power,timing, temperature and reliability constraints and toidentify the power saving opportunities. In order toexplore the design space for better options, identifypower saving solutions, detect peak power, and im-prove the reliability, a profound understanding of thepower dissipation behavior of the design is requiredwithin reasonable time. The major design decisionssuch as component selection, scheduling, pipelining,and design configurations are essential to be made assoon as possible in the design process. Since chang-ing them at lower levels is expensive in terms of bothtime and effort. Our proposed MAVO framework isdesigned to answer the need for system-level powerand performance evaluation with minimal effort. Themain contributions of this paper are:

• Provide a framework to monitor the systembehavior in terms of power and performance,

1

Page 7: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

through both power and performance logs, andgraphical visualization over time, in differentpart of HW and SW, as well as different designelements.

• Present a comprehensive analysis of system be-havior, based on power and performance reports

• Apply power management mechanisms and as-sist the designer to investigate further power sav-ing solutions.

MAVO presents power and performance optimiza-tion techniques for design modification, voltage andfrequency scaling, power aware scheduling, and dy-namic power management with shut-off.

2 Motivation

Power and performance are major design concerns,and they directly effect on all other aspect of the de-sign, such as area, temperature, reliability and life-time of the device. However, evaluating and monitor-ing power and performance is a prime design chal-lenge, particularly in multiprocessor SoCs. There-fore, a comprehensive analysis of energy dissipationwithin the system among HWs, communication ele-ments, memories and SW processors is essential andcan be achieved by profiling the simulation and ap-plying power models. The features and functionalityof MAVO are designed to fill this gap, at system level.The power optimization techniques have simple ideabehind them, like voltage and frequency scaling or dy-namic slack reclamation, however, in order to applythe techniques either statically in design phase, or dy-namically during running time, a powerful platformis required to investigate the design rapidly and withadequate details.

2.1 Design Modification

Multiple techniques have been proposed for optimiz-ing power dissipation. However, a low power designis mainly efficient due to its architecture and designmodel itself rather than the applied optimization tech-niques. For instance, the effect of having a systemworking in form of a pipeline configuration and bal-ancing the pipeline stages, cannot be made by apply-ing a power optimization technique, such as DVFS.

BehaviorC

BehaviorD

BehaviorB

BehaviorA

(a) Original design

BehaviorD

BehaviorC

BehaviorA

Sub-BehaviorB1

Sub-BehaviorB2

(b) Modified design

Figure 1: Evaluation of different architectures forpower and performance

Fig.1 shows two different design options of a design.The design in Fig.1(a) has 4 pipeline stages,A,B,CandD. After monitoring the power dissipation withindifferent behaviors and availability of processing el-ements, the pipeline stages, and weight of their as-signed behaviors, design has been modified as it isshown in Fig.1(b). Fig.1(b) shows an alternative withsplit stageB (B1 andB2) and merged stageC andD.Without an infrastructure to monitor and profile theperformance and power in each stage, it is impossibleto apply these modification and decide which archi-tecture is more efficient.

2.2 Voltage and Frequency Scaling

The fact that power is basically spending energy overtime allows design optimization with respect to fre-quency, and supply voltage. We can reduce power dis-sipation and as a result develop more reliable designsby lowering the frequency or supply voltage withinthe defined deadline and without compromising theperformance. Fig.2 illustrates the general idea of thisscheme. The working frequency ofPEi is reduced

tstart tstarttdeadline tdeadline

power

power

PEi

PEi

Figure 2: AdjustingPE clock frequency

to minimize power dissipation, while meeting the re-

2

Page 8: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

quested deadline.

2.3 Balancing power dissipation by schedul-ing

Throughout the life-time of a device it is importantto balance power dissipation. This can effectively re-duce the working temperature of the device, improvereliability, minimize faults, and extend the systemlife-time. MAVO supports monitoring the mode and

PEj

tstart tstarttdeadline tdeadline

power

power

PEi

PEj

PEi

Figure 3: Scheduling Power Dissipation with MAVOframework

the activity intervals of each design element, as wellas the amount of their power consumption. Using thisinformation, designer can easily examine schedulingalternatives and power saving opportunities via sim-ple, yet effective design modification. Fig.3 demon-strates this by improved scheduling of the workingintervals of two processing element,PEi andPEj , tobalance the overall power usage, and reducing peaksand temperature at the same time.

2.4 Smoothing Power Spikes

The peak power of the design is among the factors thatdirectly influences the reliability, thermal limitations,cost and size of the device [8]. Fig.4 illustrates thegeneral idea of eliminating low and high spikes. Theunwanted power dissipation behavior can be avoidedby scaling frequency within the involve units. Inan ideal design, peak power should be limited to cer-tain range. In MAVO we are using a simple methodto monitor different active process of the design andscale down the frequency, in order to avoid out ofrange peak power.

ti tj

power

power

PEi PEi

ti tj

Figure 4: Smoothing power spikes with MAVOframework

2.5 Power shutoff

Finally, in order to reduce static power, a commonDynamic Power Management(DPM) technique is toshutoff the inactive devices. MAVO also supportsthis approach.

3 Related works

There has been large body of work and research ef-forts on low-power design, power optimization tech-niques, and power aware EDA tools, for design atdifferent levels of abstraction. For power and perfor-mance estimation at system level, two common stepswithin all these studies are: generating power modelsand tracing model simulation to extract informationneeded by power models. A functional level poweranalysis approach is used in PETS [10]. PETS usesgeneric power models while extracting micro archi-tectural activity to tackle the accuracy-speed trade-off. COMPLEX [5] is a framework for HW/SW co-design at system levels and allows applying hybridcombination of power models from various works fordifferent design components. Wattch [1] and Simple-Power [14] are cycle-accurate power estimators withlow speed. A power, performance and area estima-tor with built-in power models for all types of HWunits is presented in McPAT [7]. McPAT has to beused along with a simulator as well. Although powermodels can work with negligible error, the main lim-itation at system level is to rapidly collect detailedpower traces for different applications. To alleviatethis problem, simulators Sniper [2] as an interval sim-ulator and Multi2Sim [13] as functional simulator are

3

Page 9: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

proposed. There are also works that extend SLDLswith additional libraries and allows the designer to in-sert power data/functions to design manually, such asPowerSC [6] and TLM POWER3 [4].These power estimators at system level generate gen-eral power reports in form of average power consump-tion, performance, or the trace of total power of thedesign only. However, our MAVO framework en-ables the designer to concentrate on any HW or SWpart of the design, their working intervals, their powerconsumption over time, and modes of operation, withuser-defined granularity. This feature allows to makecritical design decisions and find power optimizationsolutions easy and rapidly with clear understanding ofthe design.

4 Approach:MAVO Framework

An overview of the design flow using MAVO ispresented in Fig.5. The main developed compo-nents consist of aMonitor [11], PowerAnalyzerAPI[12], power-time modelAnnotator, and an interac-tive power-performanceOptimizer. We developed theMonitor and thePowerAnalyzerAPI [12], and eval-uated them in terms of accuracy and fidelity for ESLpower estimation [11]. Our results confirms that theMonitor along with thePowerAnalyzerdeliver rapidestimates with high fidelity and at minimal cost. Inthis work we automated the monitoring and poweranalysis. We also integrated the visualization and in-teractive optimization capabilities.As it is shown in Fig.5, the design process at sys-tem level starts with a specification model, that re-flects the functional behavior of the system, withoutany notion of time nor power. Next, Processing Ele-

B2C1

C2 C4

C3B3

B1

B4B5

System-Level Model

CPU Mem

IPHW

CPU Bus DSP Bus

B1 B3V1v2

B5B4

DSP

c4c

3

s

1

OSos

B2

Bri

dg

e

Architecture Model

Power & TimeAnnotated model

CPU Mem

IPHW

CPU Bus DSP Bus

B1 B3V1v2

B5B4

DSP

c4c3s1

OSos

B2

Brid

ge

Simulation

Power Report po

we

r

time

po

we

r

time

Monitor

PowerLibrary

TimingLibrary

PowerAnalyzer

API

Annotator

Power Optimization

- Design Modification- DPM- Adjusting Metrics- Balance peak power- Alter mapping- Alter PEe

Figure 5: Design flow with MAVO framework

ment (PE) allocations, behavior mappings, schedul-

ing and communication refinements are performed.All these design decisions generate a large designspace which needs to be minimized through elimi-nation, based on design constraints. All invalid de-sign options are pruned from the design space and thebest design options go to power optimization and per-formance improvement phase. In this work, we im-plemented the system level specification models inSpecC language, and the System-on-Chip Environ-ment (SCE) [3] is used for component allocation, ar-chitecture mapping, and refinements. Although wepicked SpecC language and SCE as the design en-vironment, MAVO can be used in SystemC or otherdesign frameworks similarly. Moreover, each moduleof MAVO, Monitor, PowerAnalyzerAPI, Annotator,and power-performanceOptimizer, can be used sepa-rately as needed or integrated into other tools as well.Once the architecture model is ready, the subsequentsteps are profiling the model, annotating power andtime related functions to the design, and finally thepower and performance analysis.

4.1 Monitor

The system specification model represents the func-tionality of the system. The system level design lan-guages allow the designer to specify the design withtiming notion, different communication attributessuch as channels, queues, and the format of behav-iors executions such as parallel, pipeline, sequentialor finite state machines . For power and timing anal-ysis, designer needs to specify the structural model inwhich processing elements, communication elements,and memories are allocated, and mappings are definedfor specification model components. In order to gen-erate the power and timing reports and perform anal-ysis,Monitor [11] profiles the trace of different oper-ations executed on processing elements and memoryaccesses, besides the amount and type of data beingtransferred over the channels. These traces can becollected with different levels of granularity, for in-stance, with highest granularity for the whole design,or lower, the trace of every component and commu-nication channel of the design. In this work, we aremonitoring with granularity of every basic block. Thebasic block in this work is defined as in compilersdesign. The traces of operations, data accesses, and

4

Page 10: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

transferred data are generated dynamically throughsimulation of system level architecture model.

4.2 Power and Performance Annotator

In order to perform power and timing aware simula-tion, and design exploration, the collected traces areannotated to the structural model. In order to maintainaccuracy, the traces are associated with every basicblock of the model. Therefore, anAnnotator is de-signed to insert power and timing information. Nev-ertheless, theAnnotator can support the annotationwith higher granularity of behaviors and componentsas well. The back annotated information includes ex-ecution delay, static, and dynamic power dissipatedwithin the corresponding basic block, considering thetype of the design component that block is mapped to,its configured operational mode, as well as the com-munication transactions through the assigned commu-nication unit. In order to process these values,Power-AnalyzerAPI [12] is applied. TheAnnotatoris linkedto PowerAnalyzerAPI in order to apply the powerfunctions and use generated values for annotations.An example of a basic block annotated with time andpower information is presented below.

{ / / b a s i c b l o ck : Bi{

Label i :wa i t f o r t ime ;d i s s i p a t e( PowerMeterA , Dynamic Power , S t a t i c

Power ) ;}. . .d ++;Ch1 . send ( d ) ;

}

As shown, thedissipate function represents theamount of power spent in basic blockBi , in formof dynamic and static power, as well as thePower-MeterA that monitorsBi power dissipation. MAVOcan support the dynamic and static power monitor-ing separately, so the designer can focus on any ofthem as needed. The static power represents leakagepower, and because of shrinking in transistor size tosub-micron, static power needs to be investigated ascarefully as dynamic power.

4.3 PowerAnalyzer API

ThePowerAnalyzerAPI is developed to complementsystem level models with power notion.PowerAn-alyzer API [12] is implemented inC++ and can beused along with any ANSI-C based SLDL. In thisAPI, a set of power related functions is provided toadd the dimension of power to system level design.Using the provided functions, the user can specifypower related activities and analyze power dissipationin different processing elements, communication ele-ments, behaviors, and globally, both manually or au-tomatically using theAnnotator.In order to analzye power consumption, PowerMetersare used. Power dissipation is measured using Power-Meters and can be evaluated for any subsection of thedesign. As shown in last section, the dissipate func-tion captures the dynamic and static power dissipa-tion of a block which is monitored by aPowreMeterA.PowerMeters can get allocated for any component,behavior or block of the design automatically. How-ever, designer can define more PowerMeters for anypart of the design for further analysis if needed. ThePowerMeters are mainly useful for post-simulationpower analysis. For effective power analysis and im-proving the reliability of system, designer need tomonitor global power consumption of the system forpeak powers, and temperature analysis. However, forpower optimization investigation and balancing peakpower, detailed power dissipation traces of design el-ements, beside behaviors status, is required. For in-stance, in a design with multiple processing elements,in order to balance the power consumption within el-ements and schedule their mapped behaviors, it is re-quired to monitor elements activity intervals as wellas the amount of their power consumption, apart fromprofiling entire design power dissipation. Similarly,in order to identify a solution for smoothing a peakpower in certain processing element, designer needto study active behaviors on that element during thepeak.To generate a comprehensive analysis of power andperformance, thePowerAnalyzerAPI andAnnotatorare working alongside to use monitored power loginformation, and annotate power and timing relatedfunctions generated byPowerAnalyzerAPI. To calcu-late the dynamic and static power values, user need to

5

Page 11: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

provide power model and configuration files of eachelement.

4.4 Power Optimizer

The powerOptimizer is an infrastructure for closeevaluation and analysis of the design, through powerreports, and identifying power and performance op-timization opportunities. These opportunities canbe in form of design alteration e.g. changing theweight of computations in different blocks of the de-sign, altering algorithms, changing execution meth-ods like parallelism or pipelining, communicationpolicies, components allocations, and PE mappings.The other group of power saving solutions, can bepower optimization methods such as dynamic voltageand frequency scaling, dynamic power management,scheduling or load balancing.The main role ofOptimizer is to assist the designerwith optimization decisions.Optimizersupports gen-erating power and performance analysis for any timeinterval or subsection of the design to allow the de-signer to evaluate the design and explore other designoptions rapidly. For frequency scaling, schedulingand balancing peak power,Optimizer can help fur-ther and show the working intervals of each element,and involved design elements in peak power.TheOp-timizerassesses PowerMeters and provides numericallogs of power over time. In order to control the sizethese log files and adjust the precision of this analysis,the user can pick the sampling frequency. The usercan also specify any simulation interval to monitoras well. Most importantly, the user can view graphi-cal power dissipation over time and zoom in for spe-cific intervals of any design elements and behaviors.Moreover, theOptimizersupport merging the reportsor stacking up the power dissipation values in differ-ent PowerMeters over time.

5 Case Study: Canny Edge Detector

We have investigated the MAVO framework with aCanny edge detector. Canny is a real-life image pro-cessing application implemented in 4-stage pipelineconfiguration. The model was examined on an ARMbased processor with two custom HW units for in-

put and output, and 2 HWs for pipeline. All units arecommunicating through the AMBA BUS.

5.1 Canny: Design Modification

The 4-stage pipeline architecture is suggested by edgedetection algorithm which has four major steps. How-ever, after viewing the power and timing reports fromMAVO, it become apparent that this architecture isimperfect in term of the pipeline load in each stage.Table 1 shows the power and time consumption ofeach pipeline stage.

DesignStage1 Stage2 Stage3 Stage4 Stage5

Time(ms)

Power(mW)

Time(ms)

Power(mW)

Time(ms)

Power(mW)

Time(ms)

Power(mW)

Time(ms)

Power(mW)

4-Stage 537 328.3 184.4 77.1 353 72.5 142 38.5 - -5-Stage 226 174.6 237.8 149.6 184.4 77.1 353 72.5 142 38.53-Stage 226 174.6 237.8 149.6 688.8 188.2 - - - -

Table 1: The delay & average power of pipeline stages

Data In

MagnitudeDelta

Non-maximalSuppression

ApplyHysteresis

Canny

GaussianSmooth

Q1

Data OutQ2

(a) 4-stage pipeline

Data In

GaussianSmooth Y

MagnitudeDelta

Non-maximalSuppression

ApplyHysteresis

Canny

GaussianSmooth X

Q1

Data OutQ2

(b) 5-stage pipeline

Data In

GaussianSmooth Y

MagnitudeDelta

Non-maximalSuppression

ApplyHysteresis

Canny

GaussianSmooth X

Q1

Data OutQ2

(c) 3-stage pipeline

Figure 6: Canny Architecture

The power and timing results reveal that theGaus-sian Smoothbehavior is computationally expensiveand power hungry. In order to balance the pipelinewe modified the design to a 5-stage pipeline (Can-nyA), splitting the Gaussian Smooth behavior in toXandY dimensions. In this work the stage 1 and stage2 of the pipeline are mapped to custom HWs, and restof the stages are mapped to ARM processor. The ap-plied mappings, makes the Canny architecture worksas a 3-stage pipeline configuration. Fig.6 shows thearchitecture of Canny before and after the modifica-tion. Next we evaluate Canny for power optimization.Fig.11(a) shows the power dissipation in each of thedesign elements.

6

Page 12: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

5.2 Canny: Adjusting Frequency

Using the reports and graphs, the working frequencyand supply voltage of each unit can be optimized. InFig.11(a), a power saving opportunity can be detectedfor HW1 and HW2, which finish their tasks earlierthan the rest of stages. In turn, we lower the frequencyof HW1 and HW2 within the performance constraints(CannyB).

0

0.1

0.2

0.3

0 2e+09 4e+09

power(W)

time(ns)

HW1HW2

(a) Before

0

0.1

0.2

0.3

0 2e+09 4e+09

power(W)

time(ns)

HW1HW2

(b) After

Figure 7: Adjusting Frequency for HW1 and HW2using MAVO

By extending the processing time in stage 1 and 2,the simulation time gets extended as well. This isdue to the fact that initially it takes longer to fillthe pipeline, however, the pipeline throughput perfor-mance remains the same.

5.3 Canny: Power Aware Scheduling

In order to balance power dissipation in whole device,we can schedule the work period of the units such thatthey have minimum overlaps. For HW1 and HW2

the result of this modification (CannyC) is shown inFig.8.

0

0.1

0.2

0.3

0 2e+09 4e+09

power(W)

time(ns)

HW1HW2

(a) Before

0

0.1

0.2

0.3

0 2e+09 4e+09

power(W)

time(ns)

HW1HW2

(b) After

Figure 8: Adjusting work period for HW1 and HW2using MAVO

0

0.1

0.2

0.3

0 1e+09 2e+09 3e+09 4e+09 5e+09

power(W)

time(ns)

ApplyHysteresisDerivative

MagnitudeXYMAX

Figure 9: Active processes power dissipation in CPU

5.4 Canny: Smoothing Power Spikes

The total power results from MAVO show that thedevice is experiencing high peak powers, where the

7

Page 13: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

0

0.1

0.2

0.3

0 2e+09 4e+09

power(W)

time(ns)

ARM

(a) Before

0

0.1

0.2

0.3

0 2e+09 4e+09

power(W)

time(ns)

ARM

(b) After

Figure 10: Smoothing power dissipation usingMAVO

peak is more than the double the average. In order tosmoothen the dissipation MAVO identifies the activeprocesses during the peaks as shown in Fig.9.

Here we decide to scale the frequency for the in-volved behaviors. Fig.10 shows the results (CannyD).A block performing the floating point operations is re-sponsible for power peaks. We lower the frequencyof CPU here and in order to maintain the perfor-mance, another integer intensive behavior is scaledwith higher frequency instead. Table 2 shows thepower and performance of each models. The cannyexample has been tested with 6 images and it was ex-pected to generated one image in every 0.8 seconds.As shown, MAVO power savings resulted in an opti-mized design with no performance penalty. The opti-mized design experiences power fluctuations 8% less,based on comparing the standard deviation of powerreports and the power changes range has been reducedby 29%. The difference between the minimum andthe maximum of power dissipation over time, consid-ered as the power changes range of the design.

Model

Pipeline

Throughput

(ms)

Power

Range

(min,max)

Relative

Fluctuation

Power

(mW)

Power

Saving

CannyA 688.8/800 (86%)(0.003,0.312) 100% 374.504 +0%

CannyB 689/800 (86%) (0.003,0.312) 100% 357.113 +5%

CannyC 701.8/800 (88%)(0.003,0.312) 100% 329.903 +12%

CannyD 738.8/800 (92%)(0.001,0.204) 71% 315.511 +16%

Table 2: Timing and Performance for Canny EdgeDetector after applying each technique

6 Conclusion & Future Work

This paper presents a framework toMonitor, Analyze,Visualizeand Optimizepower and performance forlow power ESL design. MAVO is a simulation basedpower and performance estimator and optimizer. AMonitor for profiling the system model simulation,an API calledPowerAnalyzerfor computing powernumbers using power models, a modelAnnotatorforback annotating the power values automatically, andanOptimizerto apply power optimization techniques,are developed and integrated in MAVO.The Canny edge detector has been studied usingMAVO, and it is optimized up to 16% for power with-out compromising performance. The power range hasbeen reduced by 29%. Future work will address in-tegrating an efficient dynamic power manager withthermal and reliability analyzer.

References

[1] David Brooks, Vivek Tiwari, and MargaretMartonosi. Wattch: a framework forarchitectural-level power analysis and optimiza-tions, volume 28. ACM, 2000.

[2] Trevor E Carlson, Wim Heirman, and LievenEeckhout. Sniper: exploring the level of abstrac-tion for scalable and accurate parallel multi-core simulation. InProceedings of 2011 In-ternational Conference for High PerformanceComputing, Networking, Storage and Analysis,page 52. ACM, 2011.

[3] Rainer Domer, Andreas Gerstlauer, Junyu Peng,Dongwan Shin, Lukai Cai, Haobo Yu, Samar

8

Page 14: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 1e+09 2e+09 3e+09 4e+09 5e+09

power(W)

time(ns)

AMBA_ARM__HW1AMBA_ARM__HW2

AMBA_ARM__IO-OUTAMBA_ARM__IO-IN

ARMHW1HW2

(a) Canny edge detector

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 1e+09 2e+09 3e+09 4e+09 5e+09

power(W)

time(ns)

AMBA_ARM__HW1AMBA_ARM__HW2

AMBA_ARM__IO-OUTAMBA_ARM__IO-IN

ARMHW1HW2

(b) Optimized Canny Edge Detector

Figure 11: Power dissipation of Canny Edge Detector visualized and optimized by MAVO

Abdi, Daniel D Gajski, et al. System-on-chipenvironment: a SpecC-based framework for het-erogeneous MPSoC design.EURASIP Journalon Embedded Systems, 2008.

[4] David Greaves and Mehboob Yasin. TLMPOWER3: Power estimation methodology forSystemC TLM 2.0. InModels, Methods, andTools for Complex Chip Design, pages 53–68.Springer, 2014.

[5] Kim Gruttner, Philipp A Hartmann, Kai Hylla,Sven Rosinger, Wolfgang Nebel, Fernando Her-rera, Eugenio Villar, Carlo Brandolese, WilliamFornaciari, Gianluca Palermo, et al. The com-plex reference framework for hw/sw co-designand power management supporting platform-based design-space exploration.Microproces-sors and Microsystems, 37(8):966–980, 2013.

[6] Felipe Klein, Rodolfo Azevedo, Luiz Santos,and Guido Araujo. Systemc-based power eval-uation with PowerSC.Electronic System LevelDesign, pages 129–144, 2011.

[7] Sheng Li, Jung Ho Ahn, Richard D Strong,

Jay B Brockman, Dean M Tullsen, and Nor-man P Jouppi. McPAT: an integrated power,area, and timing modeling framework for mul-ticore and manycore architectures. InMi-croarchitecture, 2009. MICRO-42. 42nd AnnualIEEE/ACM International Symposium on, pages469–480. IEEE, 2009.

[8] Massoud Pedram. Power minimization in ic de-sign: principles and applications.ACM Trans-actions on Design Automation of Electronic Sys-tems (TODAES), 1(1):3–56, 1996.

[9] Jan Rabaey. Low power design essentials.Springer, 2009.

[10] Santhosh-Kumar Rethinagiri, Oscar Palomar,Osman Unsal, Adrian Cristal, Rabie Ben-Atitallah, and Smail Niar. Pets: Power and en-ergy estimation tool at system-level. InQual-ity Electronic Design (ISQED), 2014 15th Inter-national Symposium on, pages 535–542. IEEE,2014.

[11] Yasaman Samei and Rainer Domer. AutomatedEstimation of Power Consumption

9

Page 15: MAVO: An Automated Framework for ESL Design …doemer/publications/CECS_TR_14_12.pdfMonitor, Analyze, Visualize and Optimize ... (DVFS), power scheduling, ... An Automated Framework

for Rapid System Level Design. InPerformanceComputing and Communications Conference(IPCCC), 2014 IEEE 33rd International. IEEE,2014.

[12] Yasaman Samei and Rainer Domer. PowerMon-itor: A Versatile API for Automated Power-Aware ESL Design. InSpecification & DesignLanguages (FDL), 2014 Forum on. IEEE, 2014.

[13] R Ubal, J Sahuquillo, S Petit, and P Lopez.Multi2sim: A simulation framework to evaluatemulticore-multithread processors. InIEEE 19thInternational Symposium on Computer Archi-tecture and High Performance computing, page(s), pages 62–68, 2007.

[14] Wu Ye, Narayanan Vijaykrishnan, MahmutKandemir, and Mary Jane Irwin. The design anduse of simplepower: a cycle-accurate energy es-timation tool. InProceedings of the 37th AnnualDesign Automation Conference, pages 340–345.ACM, 2000.

10