approximating computation and data for energy...
Post on 28-May-2020
11 Views
Preview:
TRANSCRIPT
ApproximatingComputationandDataforEnergyEfficiency
1stIWESSeptember 20th,2016,Pisa,Italy
EDAGroupPolitecnico diTorino
Torino,Italy
DanieleJahierPagliari
2
Outline
§ ErrorToleranceandApproximateComputing
§OurView
§ ACinProcessing
§ ACinInterconnects
§ ACinActuators(OLEDDisplays)
§ Conclusions
3
ErrorTolerance§ Manyemergingapplicationsareerrortolerant (orresilient)
Applications Error
Tolerance
Noisy Inputs (e.g. sensor
data)
Multiple "Golden" Outputs
Human Perception
(e.g. multimedia)
Algorithm Features
Error!
NoisyInput NoisyOutput
OriginalImage Errorsin3LSBs
4
ApproximateComputing
Performance Demands
Energy Budget
• ”Smart”Systems• InternetofThings
• Battery-operated• Energy-autonomous
Tradeoffenergyconsumption andoutputquality leveragingapplicationserrortolerance.
ApproximateComputing(AC):
Software LevelProcessor Level
Architecture LevelLogic Level
AbstractionLevels:ESsDesignChallenges:
ClassicAC:• Design-timeapproximations(fixederror).RecentTrend:• Runtime-reconfigurableerror.
Issues:• Whataboutsystem-level?.• Whataboutautomation?
5
Motivation§ EmbeddedSystem:
Environment
Processing
EETimes
Stanley-Marbell et al. DAC’16InSummary:§ Sensors,actuators andinterconnects are
relevantcontributorstoconsumption.§ Thebreakdownisstronglysystem-
dependent§ Approach:approximationasasystem-
leveldesignknob!
§ ESEnergyBreakdown:
6
ACinProcessing:RPR
Idea:[Shanbag etal.ISLPED’99]§ VoltageOver-Scaling(VOS)onthe
originalcircuit(MDSP)§ ErrorControlBlock (EC-Block)to
mitigatetheeffectoftimingerrors.
Original Circuit(MDSP)
Error ControlIN
yM
y
Decision Block
Estimator(RPR)
yr
Arrivaltime[s]
N.ofp
aths
Tclk
Arrivaltime[s]
N.ofpaths
TclkVOS
Viol.
7
ACinProcessing:RPR
Idea:[Shanbag etal.ISLPED’99]§ VoltageOver-Scaling(VOS)onthe
originalcircuit(MDSP)§ ErrorControlBlock (EC-Block)to
mitigatetheeffectoftimingerrors.
ECBlockStructure:§ Estimator oftheerror-freeoutput§ Decisionblock toselectbetween
MDSPandEstimatoroutputs
EstimatorImplementation:§ ReducedPrecisionReplica(RPR)
Original Circuit(MDSP)
Error ControlIN
yM
y
Decision Block
Estimator(RPR)
yr
Arrivaltime[s]
N.ofp
aths
Tclk
Arrivaltime[s]
N.ofpaths
TclkVOS
Viol.
8
ACinProcessing:OurContributionClassicRPRhaslimitations:§ Replicadesignanderrorestimation
requireknowledgeoffunctionality(designspecific)
§ Usessimplifiedandunrealisticassumptions(e.g.oninputstatistics)
ProposedFramework:§ AutomaticallyaddRPRtoexisting
gate-levelnetlistofadatapathcircuit.
Features:§ Functionality-agnostic.§ Simulation-based.§ Integratedwithstate-of-the-art
tools forsynthesisandsimulation.
𝝈𝒔𝟐
𝝈𝒚𝒓𝟐 +𝝈𝜼𝟐
MDSPNetlist
RepresentativeInputSet
QualityEvaluation
01101110100111010001011000011011
RPRAutomation
Engine
RPRNetlist
EDATools(e.g.DC)
9
ACinProcessing:ResultsSetup:§ 45nmlibraryfromSTM.§ Opencores designs,realisticquality
constraintsGenerality:§ SuccessfullyappliedRPRto
previouslyuntesteddesigns(CORDIC,SRU).
§ Comparablesavingsw.r.t.ad-hocapproach onFIRandFFT.
Benchmark Tot. Power Saving [%]
Area Ovr. [%]
FIR Filter 44.96 82.39FFT Butterfly 49.66 133.20RM-CORDIC 42.05 127.64SRU 47.91 143.32
Benefitsofsimulation-basedapproach:§ Differentinputstimulicausedifferent
errorrateontheMDSP,atthesameVVOS.§ Consequently,alarger/smallerreplicacan
beusedtoobtainthesamequality.§ Strongimpactofinputsonthe
obtainablepowersavings.
>20%difference!!
RPRpowersavingvsvoltageforaFIRfilter,withdifferentinputstimuli(samequalityconstraint).
10
ACinInterconnects:MotivationSerialbuses:§ De-factostandardforinterfacingsensors,
actuatorsandI/Ocontrollers§ Higherfrequencies,nojitterissue,
reducedcrosstalk§ Lowercosts(lesspinsandeasierwiring)§ SPI,I2C,MIPI,etc.
Motivation:§ PCBtraceshavelargecapacitiveloadsthat
havenotscaledastransistors!§ Transmissionofone12bitsample »
executionof1instruction![1][2]
§ Tens ofserialconnectionsinasystem!
[1] P. Stanley-Marbell and M. Rinard. Value-deviation-bounded serial data encoding for energy-efficient approximate communication. 2015 [2] N. Ickes, et al. . A 10-pJ/instruction, 4-MIPS micropower DSP for sensor applications. 2008.
ErrorTolerantBusTraces:§ SensorICs/multimediaactuators(audio
DAC,displays)
§ Long“idle” (roughlyconstant)phases.§ Short“bursty”(fastandlargevariations)
phases.§ Example: Lena image(redchannel)
(Most)informationconveyedbythebursty phases!
11
ACinInterconnects:ST0/ADETwoEncodingswithcommonPrinciples:§ Exploitidlephasesforpowersaving!§ Avoidredundancy(introduceslargeoverheadsinserialbuses)§ Allowruntime-reconfigurationofacceptederror.§ Simpleimplementation(CODECHWoverheadsmustnotoffsetgains).
§ SerialT0(ST0):§ Selectivelytransmitthecorrectdatumoraspecial0-Transitionspattern(interpretedas“repeatpreviousdatum").
§ |b(t)– b(t’)|> Thà Sendcorrectdata§ |b(t)– b(t’)|≤ Thà Send0-Tpattern.
§ ApproximateDifferentialEncoding(ADE):§ BasedonbitwiseDifferentialEncoding(DE):B(t)=b(t)⊕ b(t-1)§ EnhancedwithLSB-saturation toreducetransitionsalsoduringbursty phases
12
ACinInterconnects:Results
0.05 0.1 0.15 0.2 0.25 0.3 0.35Error [%]
10
20
30
40
50
60
70
TC R
educ
tion
[%]
ADELSBSRAKEST0DE
0 0.1 0.2 0.3 0.4Error [%]
10
20
30
40
50
60
70
80
90
TC R
educ
tion
[%]
ADELSBSRAKEST0DE
1 2 3 4 5 6Error [%]
10
20
30
40
50
60
70
80
90
TC R
educ
tion
[%]
ADELSBSRAKEST0DE
Comparison:§ Rake [Stanley-Marbell,DAC’16]§ LSBS andAccurateDE
Results:§ ADEandST0arebothsuperiortostate-
of-the-art§ ST0betterfor“strongburstiness”,ADE
superiorformorerandomdata.
Images [Kodak Database] ECG [Physionet Database]
Accelerometer [PSR Database]
⋍+40%
⋍+18%
⋍+50%
13
ACinActuators:OLEDDisplays
OLEDDisplays:§ Brighterandbetterviewingangles
w.r.t.LCDs§ Thinnerand/orflexiblepanels
OLEDsareemissive:§ Powerstronglydependsonpixels
luminanceand(secondarily)color
§ Poweroptimizationcanbeachievedwithanimagetransformation!(≠LCD)
Motivation:§ Transformationsforgeneralimagesmustpreservecontrastwhilereducingpowerconsumption.
§ Existingsolutionsarecomputationallyintensive.
§ Poweroverhead?§ Realtime applicability?
Claim:§ Similartransformationscanbeobtainedbymuchsimpler(approximate)computations.
14
ACinActuators:OLEDDisplays
3rd OrderPolynomialFit:§ Transformimagesaccordingtoa3rd
orderpolynomialfunctionoftheinputluminance (YUVspace)
§ Polynomialevaluationvs.histogramprocessing,etc.
§ Simplerandfeweroperations(ADD,MULT)
Approach:1. OfflineTrainingPhase
(ComputationallyIntensive):
2. OnlineTransformation(LinearComplexity):
0 0.2 0.4 0.6 0.8 1Input Luminance (Y)
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Outpu
t Lum
inanc
e (Y t)
Data from Lee et al.3rd-order Polynomial Fit
Lee et al, TIP 2010
ImageDBExtraction offunction
parameters
QualityConstraints
Params.
Determine and apply image-
specific transformation
Params.
15
ACinActuators:Results
§ Comparablequalityatiso-savingsw.r.t.state-of-the-art§ Visuallyandquantitatively:
§ Muchlowercomplexity!(SWorHW)§ 10xfasterthanLeeetal.§ Minimalpoweroverhead forHWimplementation.
Original Leeetal. Proposed
16
ACinActuators:Results
§ Comparablequalityatiso-savingsw.r.t.state-of-the-art§ Visuallyandquantitatively:
§ Muchlowercomplexity!(SWorHW)§ 10xfasterthanLeeetal.§ Minimalpoweroverhead forHWimplementation.
Original Leeetal. Proposed
MSSIM0.692
MSSIM0.799
MSSIM0.860
MSSIM0.691
17
ACinActuators:Results
§ Comparablequalityatiso-savingsw.r.t.state-of-the-art§ Visuallyandquantitatively:
§ Muchlowercomplexity!(SWorHW)§ 10xfasterthanLeeetal.§ Minimalpoweroverhead forHWimplementation.
Original Leeetal. Proposed
Saving61.6 %
Saving59.5 %
Saving55.1 %
Saving69.4 %
18
Conclusions
§ Exploringtheenergyversusqualitytradeoffcanbeinterestingatsystemlevel:
§ Thecomputationpart isnotalwaystheonetoblame.
§ Automation aspectsarekeytothewidespreaddiffusionofthesedesigntechniques.
§ OpenIssues/FutureWork:§ ACinmemories?§ ACinsensors (and/or,ADCs)?§ Howto combineACtechniquesindifferentpartsofthesystemtomaximizetotalpowersavings?
§ (e.g.,encoding+RPR)
19
THANKYOU!
top related