distributed low power embedded system

8/13/2019 Distributed Low Power Embedded System

http://slidepdf.com/reader/full/distributed-low-power-embedded-system 1/15

PAPER PRESENTED ON:

ISTRIBUTE LOW POWER EMBE E

SYSTEM



ABSTRACT

multiple-processor system can potentially achieve higher energy savingsthan a single processor, because Nthe reduced workload on each processor

creates new opportunities for dynamic voltage scaling (DVS! "owever, asthe cost of communication starts to match or surpass that of computation,

many new challenges arise in making DVS effective in a distributed systemunder communication intensive workload! #his paper discusses

implementation issues for supporting DVS on distributed embedderocessors! $e implemented and evaluated four distributed schemes% (&

DVS during '), (* partitioning, (+ powerfailure recovery, and ( noderotation! $e validated the results on a distributed embedded system with the

tsy pocket computers connected by serial links! )ur eperiments con-irmed that a distributed system can create new DVS opportunities an

achieve further energy savings! "owever,a surprising result was that

aggregate energy savings do not translate directly into a longer battery life!n fact, the best partitioning scheme, which distributed the workload onto

two nodes and enabled the most power-efficient ./0 speeds at +12314,resulted in only &34 improvement in battery lifetime! )f the four techni5ues

evaluated, node rotation showed the most measurable improvement tobattery lifetime at 34 by balancing the discharge rates among the nodes!



1 Introduction

Dynamic voltage scaling (DVS) is one of the most studied topics in low-power embedded

systems. Based on C!S characteristics" the power consumption is proportional to V #$while the supply voltage V is linearly proportional to the cloc% fre&uency. 'o fully

eploit such &uadratic power vs. voltage scaling effects" previous studies have

etensively eplored DVS with real-time and non-real-time scheduling techni&ues. sDVS reaches its limit on a single processor" researchers turn to multiple processors to

create additional opportunities for DVS. ultiple processors can potentially achieve

higher energy savings than a single processor. By partitioning the wor%load onto multiple processors" each processor is now responsible for only a fraction of the wor%load and can

operate at a lower voltage*fr&uency level with &uadratic power saving. eanwhile" the

lost performance can be compensated by the increased parallelism. nother advantage

with a distributed scheme is that heterogeneous hardware such as DS+ and otheraccelerators can further improve power efficiency of various stages of the computation

through speciali,ation. lthough a tightly-coupled" shared-memory multiprocessor

architecture may have more power*performance advantages" they are not as scalable as

distributed" messagepassing schemes. hile distributed systems have many attractive properties" they pay a higher price for message-passing communications. ach node now

must handle not only /*! with the eternal world" but also /*! on the internal networ%.+rogramming for distributed systems is also inherently more difficult than for single

processors. lthough higher-level abstractions have been proposed to facilitate

distributed programming" these abstraction layers generate even more inter-processorcommunication traffic behind the scenes. hile this may be appropriate for high-

performance cluster computers with multi-tier" multi-gigabit switches li%e yrinet or

0igabit thernet" such high-speed" high-power communication media are not realistic for

battery-powered embedded systems. /nstead" the low-power re&uirement haveconstrained the communication interfaces to much slower" often serial interfaces such as

/#C and C1. s a result" even if the actual data wor%load is not large on an absolute

scale" it appears epensive relatively to the computation performance that can bedelivered by today2s lowpower embedded microprocessors. 'he effect of /*! on

embedded systems has not been well studied in eisting DVS wor%s. any eisting DVS

techni&ues have shown impressive power savings on a single processor. 3owever" fewresults have been fully &ualified in the contet of an entire system. ven fewer have been

validated on actual hardware. !ne common simplifying assumption is to ignore /*!.

2 Related Work

4eal-time scheduling has been etended to DVS scheduling on variable-voltage processors. 'he initial scheduling model was introduced by 5ao et al 6789" then etended

and refined by 5asuura 6:9" ;uan 6<9 and many studies in variations for real-time

scheduling problems. Since power is a &uadratic function of the supplying voltage"lowering the voltage can result in significant savings while still enabling the processor to

continue ma%ing progress such that the tas%s can be completed before their deadlines.

'hese techni&ues often focus on the energy reduction to the processor only$ while the power consumption of other components" including memory" /*!" is ignored. 'he results

are rarely validated on real hardware. DVS has been applied to benchmar% applications

such as =+0 and +0 in embedded systems. /m et al 6>9 proposes to buffer the

incoming tas%s such that the idle period between tas% arrivals can be utili,ed by DVS.



Shin et al 6?9 introduces an intra-tas% DVS scheme that maimally utili,es the slac% time

within one tas%. Choi et al 6#9 presents a DVS scheme in an +0 decoder by ta%ing

advantage of the different types of frames in the +0 stream. 'hese techni&ues can bevalidated with measurement on real or emulated platforms. 3owever" they are also

computationoriented such that the processor performs only very little" if any /*!. 'he

impact of /*! still remains under-studied. DVS has recently been etended to multiprocessor systems. eglar, 6@9 proposes partitioning the computation onto a multi-

processor architecture that consumes significantly less power than a single processor.

'here is a fundamental difference in applying techni&ues for multiprocessors todistributed systems. inimi,ing the global energy consumption will etend the battery

life only if the whole system is assumed to be powered by a single battery unit. /n a

distributed environment" each node is power by a dedicated battery. ven a globally

optimal solution may cause poor battery efficiency locally and result in shortened systemuptime as well as loss of battery capacities. ale%iet al 6A9 analy,es the energy efficiency

of routing protocols in an ad-hoc networ% and shows that the global optimalschemes

often contradicts the goal to etend the lifetime of

a distributed system.

3 Motivating

Example

e select animage processing

algorithm"

automatic target

recognition('4) as our

motivating

eample toevaluate a

realistic

application under/*! pressure. /ts

bloc% diagram is

shown in ig. 7.

'he algorithm isable to detect pre-defined targets on an input image. or each target" a region of interest is

etracted and filtered by templates. inally" the distance of each target is computed.

throughput constraint is imposed such that the image frames must be processed at a fiedrate. e evaluate a few DVS schemes on one or more embedded processors that perform

the '4 algorithm. nli%e many DVS studies that ignore /*!" we assume that all nodes

in our system are connected to a communication networ%.'his networ% carries data from eternal sources (e.g." a camera" sensor" etc.) internal

communications between the nodes" and to an eternal destination (e.g." a +C). 'his study

assumes only one image and one target are processed at a time" although a multi-frame"

multi-target version of the algorithm is also available.e refer to each embedded



processor as a node. node is a full-fledged computer system with a voltage-scalable

processor" /*! devices" and memory. ach node performs a computation tas% /6). and

two communication tas%s 67.V and S7ND. 67.V receives data from the eternal sourceor another node. 'he data is processed by /6). " which consists of one or more

functional bloc%s of the '4 algorithm. 'he result is transmitted by tas% S7ND to

another node or the destination. Due to data dependencies" tas%s 67.V " /6). " andS7ND must be fully seriali,ed for each node. /n addition" they must complete within a

time period called the frame delay D" which is defined as the performance constraint. ig.

# illustrates the timing vs. power diagram of a node.

hen multiple nodes are configured as a distributed system" we organi,e them in a

pipeline for the '4 algorithm. ig. shows the timing vs. power diagram of two

pipelined nodes performing the '4 algorithm. Node7 maps first two function bloc%s"and Node# performs the other two bloc%s. Node7 receives one frame from the data

source" and it processes the data and sends the intermediate result to Node# in D seconds.

fter Node# starts receiving from Node7" it finishes its share of computation and sendsthe final result to the destination within D seconds. ig. shows that if the data source

%eeps producing one frame every D seconds"and both Node7 and Node# can send their

results also in D seconds" then the distributed pipeline is able to provide one result inevery D seconds to the destination.



Experimental !lat"orm

e use the /tsy poc%et computers as distributed nodes" connected by a 'C+*/+ networ%over serial lin%s. e present the performance and power profiles of the '4 algorithm

running on /tsy computers and define the metrics for our eperiments.#1 T$e It%& !ocket Computer

'he /tsy poc%et computer is a full-fledged miniaturi,ed computer system developed byCompa& estern Digital Eab 67" 9. /t supports DVS on the Strong4 S-7788

processor with 77 fre&uency levels from A@ F #8:.> 3, over > different voltage levels.

/tsy also has #B flash memory for off-line storage and #B D4 as the mainmemory and a 4 dis%. 'he power supply is a >V lithium-ion battery pac%. Due to the

power density constraint of the battery" /tsy currently does not support highspeed /*!

such as thernet or SB. 'he applicable /*! ports are a serial port and an infra-red port./tsy runs Einu withnetwor%ing support. /ts bloc% diagram is shown in ig. >.

#2 'et(ork Con"iguration

e currently use the serial port as the networ% interface. e set up a separate host

computer as both the eternal source and destination. /t connects the /tsy nodes through

multiple serial ports established by SB*serial adaptors. e setup individual +++ (point-to-point protocol) connections between each /tsy node and the host computer. 'herefore

the host computer acts as the hub for multiple +++ networ%s" and it assigns a uni&ue /+

address to each /tsy node. inally" we start the /+ forwarding service on the host

computer to allow /tsy nodes to communicate with each other.



transparently as if they were on the same 'C+*/+ networ%. 'he networ% configuration is

shown in ig. A. 'he serial lin% might not be the best choice for interconnect" but it is often used in real life due to power constraints. high-speed networ% interface

re&uires several atts of power" which is too high for battery-powered embedded

systems such as /tsy. /n this paper" our primary goal is to investigate the new

opportunities for DVS-enabled power vs. performance trade-offs in distributed embeddedsystems with intensive /*!. 0iven the limitations of serial ports" we do not intend to

propose our eperimental platform as a prototype of a new distributed networ%architecture. e chose this networ% platform primarily because it represents the state of

the art in power management capabilities. /t is also consistent with the relatively

epensive communication" both in terms of time and energy" seen by such systems. e

epect that our findings in this paper can be applied to many communication-intensiveapplications on other networ% architectures" where communication is a %ey factor for

both performance and power management.#3 !er"ormance !ro"ile o" t$e ATR Algorit$m

ach single iteration of the entire '4 algorithm ta%es 7.7 seconds to complete on one

/tsy node running at the pea% cloc% rate of #8:.> 3,. hen the cloc% rate is reduced"the performance degrades linearly with the cloc% rate. 'he +++ connection on the serial port has a maimum data rate of 77A.# Gbps" though our measured data rate is

roughly ?8 Gbps. /n addition" the startup time for establishing a single communication

transaction ta%es A8F788 ms. 'he computation and communication behaviors are profiledand summari,ed in ig. :. 'he functional bloc%s can be all combined into one node or

distributed onto multiple nodes



in a pipeline. /n the single node case there are no communications between adHacent

nodes" although the node still has to communicate with the host computer.# !o(er !ro"ile o" t$e ATR Algorit$m

ig. : shows the net current draw of one /tsy node. 'he nhori,ontal ais represents the

fre&uency and corresponding voltage levels. 'he data are collected by /tsy2s built-in

power monitor. During all eperiments the ECD screen and the spea%er are turned off to

reduce unnecessary power consumption. 'he eecution of the '4 algorithm on /tsy hasthree modes of operationsI idle" communication and computation. /n idle mode" the /tsy

node has neither /*! nor any computation wor%load. /n communication mode" it is either

sending or receiving data through the serial port. /n computation mode" it eecutes the'4 algorithm. ig. : shows the three curves range from 8 m to 78 m" indicating a

power range from 8.7 to 8.A. 'he computation always dominates the power

consumption. 3owever" due to the slow data rate of the serial port" communication tas%shave long delays thus consume a significant amount of energy" although the

communication power level is not the highest. s a result" /*! energy becomes a primary

target to optimi,e in addition to DVS on computation.#) Metric%

e evaluate several DVS techni&ues by a series of eperiments with one or more /tsy

nodes. baseline configuration is a single /tsy node running the entire '4 algorithm at

the highest cloc% rate. /t is able to produce one result in every D seconds. or alleperiments" we fi this frame delay D as the performance constraint and %eep the /tsy

node(s) running until the battery is fully discharged. 'he energy metric can be measured

by the battery life # ( N ) when N nodes with N batteries are being used. 'he completed

workload 8 ( N ) is the number of frames completed before the battery ehaustion. 'he



battery life in the baseline configuration is # (7). Since the frame delay D is fied" the

host computer will transmit one frame to the first /tsy node in every D seconds.

) Tec$ni*ue% under Evaluation

e first define the baseline configuration as a reference to compare eperimental results.

e briefly review the DVS techni&ues to be evaluated by our eperiments.

)#1 Ba%eline Con"iguration'he baseline configuration is a single /tsy node performing the entire '4 algorithm. /toperates at the highest C+ cloc% rate of #8:.> 3,. 'he processing tas% /6). re&uires

7.7 seconds to complete. 'he node also needs 7.7 and

8.7 seconds to receive and send data" respectively. 'herefore the total time to process one

frame is D = #. seconds. Based on the metrics we defined in Section >.A" we fi this

frame delay D = #. seconds in all eperiments.)#2 +,S during I-.

'he first techni&ue is to perform DVS during the /*! period. Since the application is

tightly constrained on timing with epensive /*! delay" there is not much opportunity for

DVS on computation without a performance penalty. !n the other hand" since the /tsynode spends a long time on communication"it is possible to apply DVS during /*!. Basedon the power characteristics shown in ig. <" /*! can operate at a significantly low-power

level at the slowest fre&uency of A@ 3,.)#3 +i%tri/uted +,S /& !artitioning

+artitioning the algorithm onto multiple nodes can create more time per slot for DVS on

each distributed node. 3owever" since the application is already /*!-bound" additional

communication between nodes can further increase the /*! pressure. few concerns



must be ta%ing into account to correctly balance computation and /*!. irst" each node

must be able to complete its tas%s 67.V " /6). " and S7ND within D = #. seconds.

ith an unbalanced partitioning" a node can be overloaded with either ecessive /*!or heavy computation" such that it cannot finish its wor% on time and then the whole

pipeline will fail to meet the performance constraint. Second" additional communication

can potentially saturate the networ% such that none of the nodes can guarantee to finishtheir wor%load on time. inally" the distributed system should deliver an etended batterylife in the normali,ed term" not Hust a longer absolute uptime. e eperiment with two

/tsy nodes" although the results do generali,e to more nodes. Based on the bloc% diagram

in ig. :" three partitioning schemes are available and illustrated in ig. ?. 'he firstscheme" where Node7 is only responsible for target detection and Node# performs the

remaining three functional bloc%s" is clearly the best among all three solutions. Due to the

least amount of /*!" both nodes are allowed to run at much lower cloc% rates. )# +i%tri/uted+,S (it$ !o(er 0ailure Recover&

/n general" it is impossible to evenly distribute the wor%load to each node in a distributed

system. /n many cases even the optimal partitioning scheme yields very unbalanced

wor%load distribution. /n our eperiments" Node# with more wor%load will have to runfaster thus its battery will ehaust sooner. fter one node fails" the distributed pipeline

will simply stall although the remaining nodes still have sufficient battery capacity to

%eep wor%ing. 'his will result in unnecessary loss of battery capacity. !ne potentialsolution is to recover from the power failure on one node by detecting the faulting node

dynamically and migrating its computation to neighboring nodes. Such techni&ues

normally re&uire additional control messages between nodes" thereby increasing /*!

pressure on the already /*!-bound applications. Since these messages will also cost time"they will force an increase of computation speed such that the node will fail even sooner.

s a proof of concept" we implement a fault recovery scheme as follows. ach sending

transaction must be ac%nowledged by the receiver. timeout mechanism is used on eachnode to detect the failure of the neighboring nodes. 'he computation share of the failed

node will then migrate to one of its neighboring nodes. 'he message reporting a faulting

node can be encapsulated into the sending data stream and the ac%nowledgment.'herefore" the information can be propagated to all nodes in the system. s mentioned in

Section >." the ac%nowledgment signal re&uires a separate transaction" which typically

costs A8F788 ms in addition to the etended /*! delay. Since the frame delay D is fied"

the processor must run faster to meet the timing constraint due to the increased /*! delayto support the power failure recovery mechanism.)#) +i%tri/uted +,S (it$ 'ode Rotation

s an alternative to the power failure recovery scheme in Section A.>" we balance theload on each node more ef- ficiently with a new techni&ue. /f all nodes are evenly

balanced" after the first battery fails" then the other batteries will also ehaust shortly.

Experimental Re%ult%

e evaluate the DVS techni&ues described in Section A by eperiments then analy,e the

results in the contet of a distributed" /*!-bound system.#1 8 A 8 94 Initial Evaluation (it$out I-.

Before eperimenting DVS with /*!" we first perform two simple eperiments on a single/tsy node to eplore the potential of DVS without /*!. 'he single /tsy node reads local

copies of the raw images and it only computes the results" instead of receiving images

from the host and sending the results bac%. 'herefore there is no communication delay or



energy consumption involved. (8 A)I e use one /tsy node to %eep running the entire

'4 algorithm at the full speed #8:.> 3,. /ts battery will ehaust in .> hours with

77.AG frames completed. (8 9)I e setup the second /tsy node to eecute at the half speed78.# 3,. 'hen it is able to continue operating for 7#.@ hours by finishing ##.AG

frames. t the half cloc% rate" the /tsy computer can complete twice wor%load as much as

it can do at the full speed. e overload the metrics notation we defined in Section >.A asfollowsI # maps the eperiment label to the total battery life" and 8 maps the eperiment

label to the number of frames processed. 3ere" # (8 A) = .> (hours)" 8 (8 A) = 77A88.

# (8 A) = 7#.@" 8 (8 9) = ##A88. 1ote that these results are not to be compared with other

eperiments" since there is no communication and no performance constraint. 'he results

are promising for having more nodes as a distributed system. By using two /tsy nodes

running at the half speed" the system should be able to deliver the same performance asone /tsy node at the full speed does" while completing four times the wor%load by using

two batteries. 3owever" such an upperbound can only be achieved without the presence

of /*!.#2 14 Ba%eline con"iguration

e defined the baseline configuration in Section A.7. 'he single /tsy node running at#8:.> 3, can last for :.7 hours and finish @.:G frames before the battery dies. 'hat is# (7) =#norm(7)= :.7" 8 (7) =@:88" 6norm(7)= 788J. Compared with eperiment

(8 A) without /*!" the completedwor%load is 7<J less since the node must spend a long

time during /*!.



#3 7 A4 +,S during I-.

s Section A.# suggests" we apply DVS to /*! periods" such that during sending andreceiving the /tsy node operatesn at A@ 3," while in computation it still runs at

#8:.>3,. rom our measurement communication delay does not increase at a lower

cloc% rate. 'hus the performance remains the same as D = #. seconds. 'hrough DVS

during /*!" the battery life is etended to <.: hours and it is able to finish 77.@G frames.'hat is # (7 A) = #norm(7 A) = <.:" 8 (7 A) = 77@88" 6norm(7 A) = 7#>J" indicating a

#>J increase in battery life. 1ote that 8 (7 A) > 8 (8 A) = 77A88. ven though the /tsy

node is communicating a large amount of data with the host computer" it completes morewor%load than it does in eperiment (8 A) without /*!. 'his is due to the recovery effect of

batteries.# 24 +i%tri/uted +,S /& !artitioning

Since there is no further opportunities for DVS with the single node" from now we

evaluate distributed configurations with two /tsy nodes in a pipeline. /n Section A. we

selected the best partitioning scheme" in which two /tsy nodes operate at A@ 3, and

78.# 3," respectively. 'he distributed two-node pipeline is able to complete ##.7Gframes in 7>.7 hours. 'hat is" # (#) = 7>.7" 8 (#) = ##788. Compared to eperiment (7)"

the battery life is more than doubled. 3owever" after normali,ing the results for two

batteries"#norm(#)=<.8A" 6norm(#)=77AJ" meaning the battery life is only effectively

etended by 7AJ. Distributed DVS is even less efficient than (7 A)" in which DVS during

/*! can etend #>J of the battery capacity.'here are a few reasons behind the results.irst" when Node# fails" the pipeline simply stalls while plenty of energy still remains on

the battery of Node7. Second" Node# always fails first because the wor%load on the two

nodes is not balanced very well. Node# has much more computation load and it has to run

at 78.# 3,$ while Node7 has very little computation such that it operates at A@ 3,.3owever" this partitioning scheme has already been optimal with the maimally balanced

load. /f we choose other partitioning schemes" the system will fail even sooner as

analy,ed in Section A..

#) # A4 +i%tri/uted +,S during I-.

DVS during /*! (7 A) can etend #>J battery life for a single node. e epect thedistributed pipeline can also benefit from the same techni&ue by applying DVS during

/*! for distributed nodes. mong the two /tsy nodes" Node7 is already configured to the

lowest cloc% rate. 'herefore"we can only reduce the cloc% rate of Node# to A@ 3,during its /*! period and leave it at 78.# 3, for computation. 'he result is # (# A) =

7>.>>" 8 (# A) = ##:88" #norm(# A) = <.## and 6norm(# A) = 77?J. !nly J more

battery capacity is observed comparing with eperiment (#). Distributed DVS during /*!

is not as effective as DVS during /*! for a single node. ccording to the power profile in

ig. <" from (7) to (7 A) the discharge current drops from 778 m to >8 m during /*!

periods" which ta%e the half of the eecution time of the single node. 3owever" from (#)to (# A)" we only optimi,e for Node# that has already operated at a low-power level

during /*! (AA m). By DVS during its /*! periods" the discharge current decreases to >8

m. 'hus" the 7A m reduction is not as considerable compared with the <8 m savingin eperiment (7 A). /n addition" Node# does not spend a long time during /*!. /t only

communicates <88 Bytes in very short periods. 'herefore" the small reduction to a small

portion of power use contributes trivially to the system. !n the other hand" Node7 hasheavy /*! load. 3owever" since it runs at the lowest power level" there is no chance to



further optimi,e its /*! power. rom eperiments (#) and (# A) we learn a few lessons.

lthough there are more distributed DVS opportunities whereas not available on a single

processor" the energy saving is no longer decided merely by the processor speed. /n asingle processor" minimi,ing energy directly optimi,es the life time of its single battery.

3owever in a distributed system" batteries are also distributed. inimi,ing global energy

does not guarantee to etend the lifetime for all batteries. /n our eperiments" the load pattern of both communication and computation decides the shortest battery life" which

often determines the uptime of the whole system.# # 94 +i%tri/uted +,S (it$ !o(er 0ailure Recover&

/n eperiments (#) and (# A)" the whole distributed system fails after Node# fails" although

Node7 is still capable of carrying on the entire algorithm. e attempt to enable the

system to detect the failure of Node# and reconfigure the remaining Node7 to continueoperating. !ur approach is described in Section A.>. e use the same partitioning scheme

in (#) and (# A). Due to the additional communication transactions for control messages"

both nodes have to run faster. s a result" Node7 must operate at <.< 3," and Node# at

77? 3,. e also perform DVS during /*! for both nodes. 'he result is" # (# 9) =

7A.

<#" 8 (# 9

) =#>A88" #norm

(# 9

) =<.

?: and 6norm(# 9

) =7#?J. ith our recoveryscheme" the system can last longer than (#) and (# A). 3owever there is no significant

improvement compared to the simple DVS during /*! scheme(7 A). Since both nodes

must run faster" Node# will fail more &uic%ly after completing [email protected] frames and Node7

can pic% up another AG frames until all batteries have ehausted. +ower failure recoveryallows the system to continue functioning with failed nodes. 3owever it is epensive in a

sense that it must be supported with additional" epensive energy consumption.#5 #. 4 +i%tri/uted +,S (it$ 'ode Rotation

p to now the distributed DVS approaches do not seem effective enough. /n eperiment(#) and (# A)" the failure of Node# shuts down the whole system. periment (# 9) allows

the remaining Node7 to continue. 3owever the power failure recovery scheme also

consumes energy before it can save energy. hat prevents a longer battery life is theunbalanced load between Node7 and Node#. /n this new eperiment

we implemented our node rotation techni&ue presented in Section A.A" combined with

DVS during /*!. Since there is no performance penalty" two nodes can still operate at atA@ 3, and 78.# 3,. By node rotation in every 788 frames" the battery life can be

etended to # (#. ) = 7<.?#" 8 (#. ) = #<@88" #norm(#. ) = ?.@7 and 6norm(#. ) =

7>AJ.'his is the best result among all techni&ues we have evaluated.1ode rotation

allows the wor%load to be evenly distributed over the networ% thus maimally utili,es thedistributed

battery capacity. 'here is also an additional benefit.Since both nodes alternate their

fre&uency between 78.#3, and A@ 3," both batteries can ta%e advantage of the

recovery effect to further etend their capacity. 'o summari,e" our eperimental resultsare presented in ig. 78. Both absolute and normali,ed battery lives are illustrated"with

normali,ed ratios annotated. 'he results of eperiments (8 A) and (8 9) without

communication are not included since it is not proper to compare them with /*!boundresults. /t should be noted that the effectiveness of these techni&ues is application-

dependent. lthough eperiment(#) and (# A) do not show much improvement in this case

study" the corresponding techni&ues can still be effective to other applications.



5 Conclu%ion

'his paper evaluates DVS techni&ues for distributed lowpower embedded systems. DVS

has been suggested an effective techni&ue for energy reduction in a single processor. sDVS opportunities diminish in communication-bound"

time-constrained applications" a distributed system can epose richer parallelism that

allows further optimi,ation for both performance and DVS opportunities. 3owever" the

designers must be aware of many tric%y and often counterintuitive issues" such asadditional /*!" partitioning" power failure recovery and load balancing" as indicated by

our study. e presented a case study of a distributed embedded

application under various DVS techni&ues. e performed a series of eperiments andmeasurements on actual hardware with DVS under /*!-intensive wor%load" which is

typically ignored by many DVS studies. e also proposed a new load balancing

techni&ue that enables more aggressive distributed DVS that maimi,es the uptime of

battery powered"distributed embedded systems.

441CSI6#9 G. Choi" G. Dantu" .-C. Cheng" and . +edram. ramebased

dynamic voltage and fre&uency scaling for a +0decoder. /n /roc! 'nternational .onference on .omputer-

Aided Design" pages <#F<<" 1ovember #88#.www.google.com

www.ieee.orgwww.ieee*spectrumonline.com

http://www.google.com/

http://www.ieee.org/

http://www.ieee/spectrumonline.com

http://www.google.com/

http://www.ieee.org/

http://www.ieee/spectrumonline.com

distributed low power embedded system

Documents