distributed low power embedded system
TRANSCRIPT
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 1/15
PAPER PRESENTED ON:
ISTRIBUTE LOW POWER EMBE E
SYSTEM
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 2/15
ABSTRACT
multiple-processor system can potentially achieve higher energy savingsthan a single processor, because Nthe reduced workload on each processor
creates new opportunities for dynamic voltage scaling (DVS! "owever, asthe cost of communication starts to match or surpass that of computation,
many new challenges arise in making DVS effective in a distributed systemunder communication intensive workload! #his paper discusses
implementation issues for supporting DVS on distributed embedderocessors! $e implemented and evaluated four distributed schemes% (&
DVS during '), (* partitioning, (+ powerfailure recovery, and ( noderotation! $e validated the results on a distributed embedded system with the
tsy pocket computers connected by serial links! )ur eperiments con-irmed that a distributed system can create new DVS opportunities an
achieve further energy savings! "owever,a surprising result was that
aggregate energy savings do not translate directly into a longer battery life!n fact, the best partitioning scheme, which distributed the workload onto
two nodes and enabled the most power-efficient ./0 speeds at +12314,resulted in only &34 improvement in battery lifetime! )f the four techni5ues
evaluated, node rotation showed the most measurable improvement tobattery lifetime at 34 by balancing the discharge rates among the nodes!
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 3/15
1 Introduction
Dynamic voltage scaling (DVS) is one of the most studied topics in low-power embedded
systems. Based on C!S characteristics" the power consumption is proportional to V #$while the supply voltage V is linearly proportional to the cloc% fre&uency. 'o fully
eploit such &uadratic power vs. voltage scaling effects" previous studies have
etensively eplored DVS with real-time and non-real-time scheduling techni&ues. sDVS reaches its limit on a single processor" researchers turn to multiple processors to
create additional opportunities for DVS. ultiple processors can potentially achieve
higher energy savings than a single processor. By partitioning the wor%load onto multiple processors" each processor is now responsible for only a fraction of the wor%load and can
operate at a lower voltage*fr&uency level with &uadratic power saving. eanwhile" the
lost performance can be compensated by the increased parallelism. nother advantage
with a distributed scheme is that heterogeneous hardware such as DS+ and otheraccelerators can further improve power effi- ciency of various stages of the computation
through speciali,ation. lthough a tightly-coupled" shared-memory multiprocessor
architecture may have more power*performance advantages" they are not as scalable as
distributed" messagepassing schemes. hile distributed systems have many attractive properties" they pay a higher price for message-passing communications. ach node now
must handle not only /*! with the eternal world" but also /*! on the internal networ%.+rogramming for distributed systems is also inherently more difficult than for single
processors. lthough higher-level abstractions have been proposed to facilitate
distributed programming" these abstraction layers generate even more inter-processorcommunication traffic behind the scenes. hile this may be appropriate for high-
performance cluster computers with multi-tier" multi-gigabit switches li%e yrinet or
0igabit thernet" such high-speed" high-power communication media are not realistic for
battery-powered embedded systems. /nstead" the low-power re&uirement haveconstrained the communication interfaces to much slower" often serial interfaces such as
/#C and C1. s a result" even if the actual data wor%load is not large on an absolute
scale" it appears epensive relatively to the computation performance that can bedelivered by today2s lowpower embedded microprocessors. 'he effect of /*! on
embedded systems has not been well studied in eisting DVS wor%s. any eisting DVS
techni&ues have shown impressive power savings on a single processor. 3owever" fewresults have been fully &ualified in the contet of an entire system. ven fewer have been
validated on actual hardware. !ne common simplifying assumption is to ignore /*!.
2 Related Work
4eal-time scheduling has been etended to DVS scheduling on variable-voltage processors. 'he initial scheduling model was introduced by 5ao et al 6789" then etended
and refined by 5asuura 6:9" ;uan 6<9 and many studies in variations for real-time
scheduling problems. Since power is a &uadratic function of the supplying voltage"lowering the voltage can result in significant savings while still enabling the processor to
continue ma%ing progress such that the tas%s can be completed before their deadlines.
'hese techni&ues often focus on the energy reduction to the processor only$ while the power consumption of other components" including memory" /*!" is ignored. 'he results
are rarely validated on real hardware. DVS has been applied to benchmar% applications
such as =+0 and +0 in embedded systems. /m et al 6>9 proposes to buffer the
incoming tas%s such that the idle period between tas% arrivals can be utili,ed by DVS.
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 4/15
Shin et al 6?9 introduces an intra-tas% DVS scheme that maimally utili,es the slac% time
within one tas%. Choi et al 6#9 presents a DVS scheme in an +0 decoder by ta%ing
advantage of the different types of frames in the +0 stream. 'hese techni&ues can bevalidated with measurement on real or emulated platforms. 3owever" they are also
computationoriented such that the processor performs only very little" if any /*!. 'he
impact of /*! still remains under-studied. DVS has recently been etended to multi- processor systems. eglar, 6@9 proposes partitioning the computation onto a multi-
processor architecture that consumes significantly less power than a single processor.
'here is a fundamental difference in applying techni&ues for multiprocessors todistributed systems. inimi,ing the global energy consumption will etend the battery
life only if the whole system is assumed to be powered by a single battery unit. /n a
distributed environment" each node is power by a dedicated battery. ven a globally
optimal solution may cause poor battery efficiency locally and result in shortened systemuptime as well as loss of battery capacities. ale%iet al 6A9 analy,es the energy efficiency
of routing protocols in an ad-hoc networ% and shows that the global optimalschemes
often contradicts the goal to etend the lifetime of
a distributed system.
3 Motivating
Example
e select animage processing
algorithm"
automatic target
recognition('4) as our
motivating
eample toevaluate a
realistic
application under/*! pressure. /ts
bloc% diagram is
shown in ig. 7.
'he algorithm isable to detect pre-defined targets on an input image. or each target" a region of interest is
etracted and filtered by templates. inally" the distance of each target is computed.
throughput constraint is imposed such that the image frames must be processed at a fiedrate. e evaluate a few DVS schemes on one or more embedded processors that perform
the '4 algorithm. nli%e many DVS studies that ignore /*!" we assume that all nodes
in our system are connected to a communication networ%.'his networ% carries data from eternal sources (e.g." a camera" sensor" etc.) internal
communications between the nodes" and to an eternal destination (e.g." a +C). 'his study
assumes only one image and one target are processed at a time" although a multi-frame"
multi-target version of the algorithm is also available.e refer to each embedded
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 5/15
processor as a node. node is a full-fledged computer system with a voltage-scalable
processor" /*! devices" and memory. ach node performs a computation tas% /6). and
two communication tas%s 67.V and S7ND. 67.V receives data from the eternal sourceor another node. 'he data is processed by /6). " which consists of one or more
functional bloc%s of the '4 algorithm. 'he result is transmitted by tas% S7ND to
another node or the destination. Due to data dependencies" tas%s 67.V " /6). " andS7ND must be fully seriali,ed for each node. /n addition" they must complete within a
time period called the frame delay D" which is defined as the performance constraint. ig.
# illustrates the timing vs. power diagram of a node.
hen multiple nodes are configured as a distributed system" we organi,e them in a
pipeline for the '4 algorithm. ig. shows the timing vs. power diagram of two
pipelined nodes performing the '4 algorithm. Node7 maps first two function bloc%s"and Node# performs the other two bloc%s. Node7 receives one frame from the data
source" and it processes the data and sends the intermediate result to Node# in D seconds.
fter Node# starts receiving from Node7" it finishes its share of computation and sendsthe final result to the destination within D seconds. ig. shows that if the data source
%eeps producing one frame every D seconds"and both Node7 and Node# can send their
results also in D seconds" then the distributed pipeline is able to provide one result inevery D seconds to the destination.
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 6/15
Experimental !lat"orm
e use the /tsy poc%et computers as distributed nodes" connected by a 'C+*/+ networ%over serial lin%s. e present the performance and power profiles of the '4 algorithm
running on /tsy computers and define the metrics for our eperiments.#1 T$e It%& !ocket Computer
'he /tsy poc%et computer is a full-fledged miniaturi,ed computer system developed byCompa& estern Digital Eab 67" 9. /t supports DVS on the Strong4 S-7788
processor with 77 fre&uency levels from A@ F #8:.> 3, over > different voltage levels.
/tsy also has #B flash memory for off-line storage and #B D4 as the mainmemory and a 4 dis%. 'he power supply is a >V lithium-ion battery pac%. Due to the
power density constraint of the battery" /tsy currently does not support highspeed /*!
such as thernet or SB. 'he applicable /*! ports are a serial port and an infra-red port./tsy runs Einu withnetwor%ing support. /ts bloc% diagram is shown in ig. >.
#2 'et(ork Con"iguration
e currently use the serial port as the networ% interface. e set up a separate host
computer as both the eternal source and destination. /t connects the /tsy nodes through
multiple serial ports established by SB*serial adaptors. e setup individual +++ (point-to-point protocol) connections between each /tsy node and the host computer. 'herefore
the host computer acts as the hub for multiple +++ networ%s" and it assigns a uni&ue /+
address to each /tsy node. inally" we start the /+ forwarding service on the host
computer to allow /tsy nodes to communicate with each other.
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 7/15
transparently as if they were on the same 'C+*/+ networ%. 'he networ% configuration is
shown in ig. A. 'he serial lin% might not be the best choice for interconnect" but it is often used in real life due to power constraints. high-speed networ% interface
re&uires several atts of power" which is too high for battery-powered embedded
systems such as /tsy. /n this paper" our primary goal is to investigate the new
opportunities for DVS-enabled power vs. performance trade-offs in distributed embeddedsystems with intensive /*!. 0iven the limitations of serial ports" we do not intend to
propose our eperimental platform as a prototype of a new distributed networ%architecture. e chose this networ% platform primarily because it represents the state of
the art in power management capabilities. /t is also consistent with the relatively
epensive communication" both in terms of time and energy" seen by such systems. e
epect that our findings in this paper can be applied to many communication-intensiveapplications on other networ% architectures" where communication is a %ey factor for
both performance and power management.#3 !er"ormance !ro"ile o" t$e ATR Algorit$m
ach single iteration of the entire '4 algorithm ta%es 7.7 seconds to complete on one
/tsy node running at the pea% cloc% rate of #8:.> 3,. hen the cloc% rate is reduced"the performance degrades linearly with the cloc% rate. 'he +++ connection on the serial port has a maimum data rate of 77A.# Gbps" though our measured data rate is
roughly ?8 Gbps. /n addition" the startup time for establishing a single communication
transaction ta%es A8F788 ms. 'he computation and communication behaviors are profiledand summari,ed in ig. :. 'he functional bloc%s can be all combined into one node or
distributed onto multiple nodes
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 8/15
in a pipeline. /n the single node case there are no communications between adHacent
nodes" although the node still has to communicate with the host computer.# !o(er !ro"ile o" t$e ATR Algorit$m
ig. : shows the net current draw of one /tsy node. 'he nhori,ontal ais represents the
fre&uency and corresponding voltage levels. 'he data are collected by /tsy2s built-in
power monitor. During all eperiments the ECD screen and the spea%er are turned off to
reduce unnecessary power consumption. 'he eecution of the '4 algorithm on /tsy hasthree modes of operationsI idle" communication and computation. /n idle mode" the /tsy
node has neither /*! nor any computation wor%load. /n communication mode" it is either
sending or receiving data through the serial port. /n computation mode" it eecutes the'4 algorithm. ig. : shows the three curves range from 8 m to 78 m" indicating a
power range from 8.7 to 8.A. 'he computation always dominates the power
consumption. 3owever" due to the slow data rate of the serial port" communication tas%shave long delays thus consume a significant amount of energy" although the
communication power level is not the highest. s a result" /*! energy becomes a primary
target to optimi,e in addition to DVS on computation.#) Metric%
e evaluate several DVS techni&ues by a series of eperiments with one or more /tsy
nodes. baseline configuration is a single /tsy node running the entire '4 algorithm at
the highest cloc% rate. /t is able to produce one result in every D seconds. or alleperiments" we fi this frame delay D as the performance constraint and %eep the /tsy
node(s) running until the battery is fully discharged. 'he energy metric can be measured
by the battery life # ( N ) when N nodes with N batteries are being used. 'he completed
workload 8 ( N ) is the number of frames completed before the battery ehaustion. 'he
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 9/15
battery life in the baseline configuration is # (7). Since the frame delay D is fied" the
host computer will transmit one frame to the first /tsy node in every D seconds.
) Tec$ni*ue% under Evaluation
e first define the baseline configuration as a reference to compare eperimental results.
e briefly review the DVS techni&ues to be evaluated by our eperiments.
)#1 Ba%eline Con"iguration'he baseline configuration is a single /tsy node performing the entire '4 algorithm. /toperates at the highest C+ cloc% rate of #8:.> 3,. 'he processing tas% /6). re&uires
7.7 seconds to complete. 'he node also needs 7.7 and
8.7 seconds to receive and send data" respectively. 'herefore the total time to process one
frame is D = #. seconds. Based on the metrics we defined in Section >.A" we fi this
frame delay D = #. seconds in all eperiments.)#2 +,S during I-.
'he first techni&ue is to perform DVS during the /*! period. Since the application is
tightly constrained on timing with epensive /*! delay" there is not much opportunity for
DVS on computation without a performance penalty. !n the other hand" since the /tsynode spends a long time on communication"it is possible to apply DVS during /*!. Basedon the power characteristics shown in ig. <" /*! can operate at a significantly low-power
level at the slowest fre&uency of A@ 3,.)#3 +i%tri/uted +,S /& !artitioning
+artitioning the algorithm onto multiple nodes can create more time per slot for DVS on
each distributed node. 3owever" since the application is already /*!-bound" additional
communication between nodes can further increase the /*! pressure. few concerns
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 10/15
must be ta%ing into account to correctly balance computation and /*!. irst" each node
must be able to complete its tas%s 67.V " /6). " and S7ND within D = #. seconds.
ith an unbalanced partitioning" a node can be overloaded with either ecessive /*!or heavy computation" such that it cannot finish its wor% on time and then the whole
pipeline will fail to meet the performance constraint. Second" additional communication
can potentially saturate the networ% such that none of the nodes can guarantee to finishtheir wor%load on time. inally" the distributed system should deliver an etended batterylife in the normali,ed term" not Hust a longer absolute uptime. e eperiment with two
/tsy nodes" although the results do generali,e to more nodes. Based on the bloc% diagram
in ig. :" three partitioning schemes are available and illustrated in ig. ?. 'he firstscheme" where Node7 is only responsible for target detection and Node# performs the
remaining three functional bloc%s" is clearly the best among all three solutions. Due to the
least amount of /*!" both nodes are allowed to run at much lower cloc% rates. )# +i%tri/uted+,S (it$ !o(er 0ailure Recover&
/n general" it is impossible to evenly distribute the wor%load to each node in a distributed
system. /n many cases even the optimal partitioning scheme yields very unbalanced
wor%load distribution. /n our eperiments" Node# with more wor%load will have to runfaster thus its battery will ehaust sooner. fter one node fails" the distributed pipeline
will simply stall although the remaining nodes still have sufficient battery capacity to
%eep wor%ing. 'his will result in unnecessary loss of battery capacity. !ne potentialsolution is to recover from the power failure on one node by detecting the faulting node
dynamically and migrating its computation to neighboring nodes. Such techni&ues
normally re&uire additional control messages between nodes" thereby increasing /*!
pressure on the already /*!-bound applications. Since these messages will also cost time"they will force an increase of computation speed such that the node will fail even sooner.
s a proof of concept" we implement a fault recovery scheme as follows. ach sending
transaction must be ac%nowledged by the receiver. timeout mechanism is used on eachnode to detect the failure of the neighboring nodes. 'he computation share of the failed
node will then migrate to one of its neighboring nodes. 'he message reporting a faulting
node can be encapsulated into the sending data stream and the ac%nowledgment.'herefore" the information can be propagated to all nodes in the system. s mentioned in
Section >." the ac%nowledgment signal re&uires a separate transaction" which typically
costs A8F788 ms in addition to the etended /*! delay. Since the frame delay D is fied"
the processor must run faster to meet the timing constraint due to the increased /*! delayto support the power failure recovery mechanism.)#) +i%tri/uted +,S (it$ 'ode Rotation
s an alternative to the power failure recovery scheme in Section A.>" we balance theload on each node more ef- ficiently with a new techni&ue. /f all nodes are evenly
balanced" after the first battery fails" then the other batteries will also ehaust shortly.
Experimental Re%ult%
e evaluate the DVS techni&ues described in Section A by eperiments then analy,e the
results in the contet of a distributed" /*!-bound system.#1 8 A 8 94 Initial Evaluation (it$out I-.
Before eperimenting DVS with /*!" we first perform two simple eperiments on a single/tsy node to eplore the potential of DVS without /*!. 'he single /tsy node reads local
copies of the raw images and it only computes the results" instead of receiving images
from the host and sending the results bac%. 'herefore there is no communication delay or
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 11/15
energy consumption involved. (8 A)I e use one /tsy node to %eep running the entire
'4 algorithm at the full speed #8:.> 3,. /ts battery will ehaust in .> hours with
77.AG frames completed. (8 9)I e setup the second /tsy node to eecute at the half speed78.# 3,. 'hen it is able to continue operating for 7#.@ hours by finishing ##.AG
frames. t the half cloc% rate" the /tsy computer can complete twice wor%load as much as
it can do at the full speed. e overload the metrics notation we defined in Section >.A asfollowsI # maps the eperiment label to the total battery life" and 8 maps the eperiment
label to the number of frames processed. 3ere" # (8 A) = .> (hours)" 8 (8 A) = 77A88.
# (8 A) = 7#.@" 8 (8 9) = ##A88. 1ote that these results are not to be compared with other
eperiments" since there is no communication and no performance constraint. 'he results
are promising for having more nodes as a distributed system. By using two /tsy nodes
running at the half speed" the system should be able to deliver the same performance asone /tsy node at the full speed does" while completing four times the wor%load by using
two batteries. 3owever" such an upperbound can only be achieved without the presence
of /*!.#2 14 Ba%eline con"iguration
e defined the baseline configuration in Section A.7. 'he single /tsy node running at#8:.> 3, can last for :.7 hours and finish @.:G frames before the battery dies. 'hat is# (7) =#norm(7)= :.7" 8 (7) =@:88" 6norm(7)= 788J. Compared with eperiment
(8 A) without /*!" the completedwor%load is 7<J less since the node must spend a long
time during /*!.
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 12/15
#3 7 A4 +,S during I-.
s Section A.# suggests" we apply DVS to /*! periods" such that during sending andreceiving the /tsy node operatesn at A@ 3," while in computation it still runs at
#8:.>3,. rom our measurement communication delay does not increase at a lower
cloc% rate. 'hus the performance remains the same as D = #. seconds. 'hrough DVS
during /*!" the battery life is etended to <.: hours and it is able to finish 77.@G frames.'hat is # (7 A) = #norm(7 A) = <.:" 8 (7 A) = 77@88" 6norm(7 A) = 7#>J" indicating a
#>J increase in battery life. 1ote that 8 (7 A) > 8 (8 A) = 77A88. ven though the /tsy
node is communicating a large amount of data with the host computer" it completes morewor%load than it does in eperiment (8 A) without /*!. 'his is due to the recovery effect of
batteries.# 24 +i%tri/uted +,S /& !artitioning
Since there is no further opportunities for DVS with the single node" from now we
evaluate distributed configurations with two /tsy nodes in a pipeline. /n Section A. we
selected the best partitioning scheme" in which two /tsy nodes operate at A@ 3, and
78.# 3," respectively. 'he distributed two-node pipeline is able to complete ##.7Gframes in 7>.7 hours. 'hat is" # (#) = 7>.7" 8 (#) = ##788. Compared to eperiment (7)"
the battery life is more than doubled. 3owever" after normali,ing the results for two
batteries"#norm(#)=<.8A" 6norm(#)=77AJ" meaning the battery life is only effectively
etended by 7AJ. Distributed DVS is even less efficient than (7 A)" in which DVS during
/*! can etend #>J of the battery capacity.'here are a few reasons behind the results.irst" when Node# fails" the pipeline simply stalls while plenty of energy still remains on
the battery of Node7. Second" Node# always fails first because the wor%load on the two
nodes is not balanced very well. Node# has much more computation load and it has to run
at 78.# 3,$ while Node7 has very little computation such that it operates at A@ 3,.3owever" this partitioning scheme has already been optimal with the maimally balanced
load. /f we choose other partitioning schemes" the system will fail even sooner as
analy,ed in Section A..
#) # A4 +i%tri/uted +,S during I-.
DVS during /*! (7 A) can etend #>J battery life for a single node. e epect thedistributed pipeline can also benefit from the same techni&ue by applying DVS during
/*! for distributed nodes. mong the two /tsy nodes" Node7 is already configured to the
lowest cloc% rate. 'herefore"we can only reduce the cloc% rate of Node# to A@ 3,during its /*! period and leave it at 78.# 3, for computation. 'he result is # (# A) =
7>.>>" 8 (# A) = ##:88" #norm(# A) = <.## and 6norm(# A) = 77?J. !nly J more
battery capacity is observed comparing with eperiment (#). Distributed DVS during /*!
is not as effective as DVS during /*! for a single node. ccording to the power profile in
ig. <" from (7) to (7 A) the discharge current drops from 778 m to >8 m during /*!
periods" which ta%e the half of the eecution time of the single node. 3owever" from (#)to (# A)" we only optimi,e for Node# that has already operated at a low-power level
during /*! (AA m). By DVS during its /*! periods" the discharge current decreases to >8
m. 'hus" the 7A m reduction is not as considerable compared with the <8 m savingin eperiment (7 A). /n addition" Node# does not spend a long time during /*!. /t only
communicates <88 Bytes in very short periods. 'herefore" the small reduction to a small
portion of power use contributes trivially to the system. !n the other hand" Node7 hasheavy /*! load. 3owever" since it runs at the lowest power level" there is no chance to
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 13/15
further optimi,e its /*! power. rom eperiments (#) and (# A) we learn a few lessons.
lthough there are more distributed DVS opportunities whereas not available on a single
processor" the energy saving is no longer decided merely by the processor speed. /n asingle processor" minimi,ing energy directly optimi,es the life time of its single battery.
3owever in a distributed system" batteries are also distributed. inimi,ing global energy
does not guarantee to etend the lifetime for all batteries. /n our eperiments" the load pattern of both communication and computation decides the shortest battery life" which
often determines the uptime of the whole system.# # 94 +i%tri/uted +,S (it$ !o(er 0ailure Recover&
/n eperiments (#) and (# A)" the whole distributed system fails after Node# fails" although
Node7 is still capable of carrying on the entire algorithm. e attempt to enable the
system to detect the failure of Node# and reconfigure the remaining Node7 to continueoperating. !ur approach is described in Section A.>. e use the same partitioning scheme
in (#) and (# A). Due to the additional communication transactions for control messages"
both nodes have to run faster. s a result" Node7 must operate at <.< 3," and Node# at
77? 3,. e also perform DVS during /*! for both nodes. 'he result is" # (# 9) =
7A.
<#" 8 (# 9
) =#>A88" #norm
(# 9
) =<.
?: and 6norm(# 9
) =7#?J. ith our recoveryscheme" the system can last longer than (#) and (# A). 3owever there is no significant
improvement compared to the simple DVS during /*! scheme(7 A). Since both nodes
must run faster" Node# will fail more &uic%ly after completing [email protected] frames and Node7
can pic% up another AG frames until all batteries have ehausted. +ower failure recoveryallows the system to continue functioning with failed nodes. 3owever it is epensive in a
sense that it must be supported with additional" epensive energy consumption.#5 #. 4 +i%tri/uted +,S (it$ 'ode Rotation
p to now the distributed DVS approaches do not seem effective enough. /n eperiment(#) and (# A)" the failure of Node# shuts down the whole system. periment (# 9) allows
the remaining Node7 to continue. 3owever the power failure recovery scheme also
consumes energy before it can save energy. hat prevents a longer battery life is theunbalanced load between Node7 and Node#. /n this new eperiment
we implemented our node rotation techni&ue presented in Section A.A" combined with
DVS during /*!. Since there is no performance penalty" two nodes can still operate at atA@ 3, and 78.# 3,. By node rotation in every 788 frames" the battery life can be
etended to # (#. ) = 7<.?#" 8 (#. ) = #<@88" #norm(#. ) = ?.@7 and 6norm(#. ) =
7>AJ.'his is the best result among all techni&ues we have evaluated.1ode rotation
allows the wor%load to be evenly distributed over the networ% thus maimally utili,es thedistributed
battery capacity. 'here is also an additional benefit.Since both nodes alternate their
fre&uency between 78.#3, and A@ 3," both batteries can ta%e advantage of the
recovery effect to further etend their capacity. 'o summari,e" our eperimental resultsare presented in ig. 78. Both absolute and normali,ed battery lives are illustrated"with
normali,ed ratios annotated. 'he results of eperiments (8 A) and (8 9) without
communication are not included since it is not proper to compare them with /*!boundresults. /t should be noted that the effectiveness of these techni&ues is application-
dependent. lthough eperiment(#) and (# A) do not show much improvement in this case
study" the corresponding techni&ues can still be effective to other applications.
8/13/2019 Distributed Low Power Embedded System
http://slidepdf.com/reader/full/distributed-low-power-embedded-system 14/15
5 Conclu%ion
'his paper evaluates DVS techni&ues for distributed lowpower embedded systems. DVS
has been suggested an effective techni&ue for energy reduction in a single processor. sDVS opportunities diminish in communication-bound"
time-constrained applications" a distributed system can epose richer parallelism that
allows further optimi,ation for both performance and DVS opportunities. 3owever" the
designers must be aware of many tric%y and often counterintuitive issues" such asadditional /*!" partitioning" power failure recovery and load balancing" as indicated by
our study. e presented a case study of a distributed embedded
application under various DVS techni&ues. e performed a series of eperiments andmeasurements on actual hardware with DVS under /*!-intensive wor%load" which is
typically ignored by many DVS studies. e also proposed a new load balancing
techni&ue that enables more aggressive distributed DVS that maimi,es the uptime of
battery powered"distributed embedded systems.
441CSI6#9 G. Choi" G. Dantu" .-C. Cheng" and . +edram. ramebased
dynamic voltage and fre&uency scaling for a +0decoder. /n /roc! 'nternational .onference on .omputer-
Aided Design" pages <#F<<" 1ovember #88#.www.google.com
www.ieee.orgwww.ieee*spectrumonline.com