perfect case studies demonstrating order of magnitude ...dkw/papers/perfect... · automation...

23
PERFECT Case Studies Demonstrating Order of Magnitude Reduction in Power Consumption David K. Wittenberg, Edin Kadric, Andre DeHon, Jonathan Edwards, Jeffrey Smith, and Silviu Chiricescu

Upload: others

Post on 28-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

PERFECTCaseStudiesDemonstratingOrderof

MagnitudeReductioninPowerConsumption

DavidK.Wittenberg,Edin Kadric,AndreDeHon,JonathanEdwards,JeffreySmith,andSilviu Chiricescu

Page 2: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

TheProblem– WideAreaMotionImaging(WAMI)

• Wantreal-timedata• Highresolution- 368Cellphonecameras• 1.8Gpixels @10Hz• Airborne• Limitedbandwidthtoground• Limitedpower

Page 3: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

3

ARGUS-ISSystemComponents

ARGUS-ISsystemcomponents.Left:sensorin25”gimbal;Right:CFPAwith925MPixelFPAs.

ARGUS-ISruggedizedairborneprocessor

Page 4: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

OurSolution

• LowerVoltage• Light-WeightChecks• ExploreParallelism• ContinuousHierarchyMemory

Page 5: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

ARGUS-IScontextdrivesPERFECTenergyimprovements

©BAESystems2013

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

IMEM

RF

EXU

P Ctrl

DMEM

M M

MM Network Switch

Network Switch

Bulk Memory Bank

Network Switch

Bulk Memory Bank

Network Switch

Bulk Memory Bank

Network Switch

Bulk Memory Bank

Network Switch

Bulk Memory Bank

Network Switch

Bulk Memory Bank

Network Switch

Bulk Memory Bank

Network Switch

Bulk Memory Bank

IMEM

RF

EXU

DMEM

IMEM

RF

EXU

DMEM

Ne

two

rk S

witc

h

ArrayDiag. &Repair

ArrayDiag. &Repair

Ne

two

rk S

witc

h

Ne

two

rk S

witc

hN

etw

ork

Sw

itch

IMEM

RF

EXU

DMEM

IMEM

RF

EXU

DMEM

Off−chipMemoryInterface

Reliable Coresand Infrastructure

Reliable Coresand InfrastructureLow Energy Array of Reconfigurable Nodes

Debayer

Lucas-

Kanade

Gaussia

nMixture

Mod

el

MeasureenergyonthePRACTICELEARNWAMIimplementation

MeasureenergyontheARGUS-ISWAMIpipelinecomponents

Rawimage Demosaicing Registration MotionDetectionandTracking FinalImage

4

Page 6: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

6

Demosaicing Registration MotionDetectionandTracking

Debayer LK GMM WAMI

PEs 1 4 8 (4,4,4)

Ops/Cycle20 220 680 360

Area(cm2)0.0046 0.33 0.2 0.55

0.7VVdd Throughput(fps)540 150 1600 150

Efficiency(Gops/W)740 280 910 340

0.5VVdd Throughput(fps)110 30 320 50

Efficiency(Gops/W) 1500 560 1800 680

512×512Image

ParallelOptimizedDesign

Page 7: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

ContinuousHierarchyMemory

Data

Addr[n-1]

Addr[n-2]Addr[n-3:0]

Lmseg(M/4)

Lmseg(M

/4)

Lmseg(M)

Lmseg(M

)

Page 8: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

64k 128k 64k-CHM 128k-CHM 64k-CHM-S128k-CHM-S0

0.02

0.04

0.06

0.08

0.1

0.12

Lucas-Kanade Energy/Frame

memroutelogicclk

Memory Size

Ene

rgy(

J)

Page 9: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

LightWeightChecks

• Example:Sortingvs.checkingthatitissorted• Notneededinmanyimagingalgorithms,astheyiterateuntilerrorsaresmallenough

Page 10: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

Differentialvs.SingleVoltage(22nmexample)

10

Vdd Vdd

Page 11: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

11

PERFECTResultsandComparisonwithARGUS

Technology Power (W) Time (s) Energy (J) GOps GOPS / WARGUS-ISGPU 77.4 6.5 507.2 940.6 1.9PERFECT 0.5 17.1 0.9 4096.6 439.9Ratio 0.01 2.6 0.002 4.4 236.9

• PERFECT Simulated Output for WAMI Pipeline– Scaled 512x512 pixel 5 Mpixel and further scaled to achieve 3000 frame run

• 7 nm process technology assumed as best-case– Varying Vdd and Mem-size led to optimized GOPS / W

• Vdd = 0.7, Mem size = 64k • Key Results:

– 237X GOPS/W increase even with memory intensive pipeline– Performed 4X more operations due to differences in implementation

Comparison of ARGUS-IS GPU and PERFECT output

Page 12: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

TAILWINDSYSTEM PERFECTSYSTEM

ImageRegistration(SLAM)

MotionDetection(RFD)

ImageRegistration(LK)

MotionDetection(GMM)

STD CHM STD CHM

Energy/Frame(mJ) 752 237 175 129 19 14

Time/Frame(ms) 253 83 444 37

TAILWIND:PERFECTgives6-15XImprovement

• Motion Detection – PERFECT kernel consumes ≈ 15X less energy than the TAILWIND kernel• Image Registration – PERFECT kernel consumes ≈ 6X less energy than the TAILWIND kernel

Page 13: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

1.ExceededPERFECTenergyefficiencygoalsbyupto10x(440-680Gops/W)oncommonsetofWideAreaMotionImagery(WAMI)kernels

2.TheprojectedArguspowerconsumptionreductionenables:• Airborne,real-time3Dsituationalawarenesstothewarfighteralongwith:– Miniaturizationof3Dmappingtechnologycurrentlyhousedinmultipleserverracks

– Increasedtheflightmissiontimeandtrackdetectionaccuracywithimproveddatacompressioninmannedflights

• Autonomousthreatdetection,trackingandmulti-sensorfusion3.DesignedanovelFPGAbasedarchitecturethatcombinesultralowpoweroperationwithhighreliability• Optimizethesubstrateanddesignmappingtominimizecommunicationenergy

Summary

Page 14: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

AcronymsUsed

ARGUS AutonomousReal-timeGround-UbiquitousSurveillance-ImagingSystemCHM ContinuousHierarchyMemoryFIT FailuresInTimeFPGA FieldProgrammableGateArrayGMM GaussianMixtureModelLEARN Low-EnergyArchitectureofReconfigurableNodesL-K Lucas-KanadeLWC LightWeightChecksPERFECT PowerEfficiencyRevolutionforEmbeddedComputingTechnologyRANSAC RANdom SAmple ConsensusRFD RobustFrameDifferencesSLAM SimultaneousLocationAndMappingTAILWIND TacticalAircrafttoIncreaseLongWaveInfraredNighttimeDetectionUAV UnmannedAerialVehicleWAMI WideAreaMotionImagery

Page 15: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

BACKUP

Page 16: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

MinimalStreamingDesign

16

FullyfunctionalpipelineDemosaicing Registration MotionDetectionandTracking

Debayer LK GMM WAMI

PEs 1 1 1 (1,1,1)

Ops/Cycle20 55 85 90

Area(cm2)0.0046 0.13 0.034 0.17

0.7VVdd

Throughput(fps)540 58 430 57

Efficiency(Gops/W)740 250 650 300

0.5VVdd

Throughput(fps)110 12 85 11

Efficiency(Gops/W) 1500 500 1300 600

512×512Image

Page 17: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

WAMIonFPGA

17

Page 18: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

OptimizingParallelism

18

LDeBayer LK GMM

512×512Image

Page 19: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

64k 128k 64k-CHM 128k-CHM 64k-CHM-S128k-CHM-S0

0.005

0.01

0.015

0.02

0.025

GMM Energy/Frame

memroutelogicclk

Memory Size

Ene

rgy

(J)

Page 20: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

Energyreductiontechniques*

20

• Uselow-powertechnology• Voltagescaling• UseT-gatesorgate-boosting(GB)insteadofpass-Tw/levelrestorer• Usedual-Vdd• Powergating

*Edin Kadric “Energyreductionthroughvoltagescalingandlightweightchecking”,PhD.Thesis.

Page 21: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

• TAILWIND – Fine-tuned

parameters for the different kernels19

TAILWINDSystemLevelMetricsDataSet SystemLevelMetric SLAM +RFD LK+RFD SLAM+GMM

QuanticoScene1 P(vehicle detection) 0.9 0.53 0.5

P(dismountdetection) 0.64 0.38 0.46

Falsealarm/minute 1.35 103 163

P(tracking vehicle) 0.3 0.1 0

P(trackingdismount) 0.33 0.14 0.14

QuanticoScene2 P(vehicledetection) 0.82 0.89 0.3

P(dismountdetection) 0.52 0.71 0.25

Falsealarm/minute 8.9 149 124

P(tracking vehicle) 0.42 0.25 0

P(trackingdismount) 0.44 0.28 0.1

QuanticoScene3 P(vehicledetection) 0.85 0.54 0.24

P(dismountdetection) 0.6 0.42 0.19

Falsealarm/minute 4.1 117 126

P(tracking vehicle) 0.15 0.13 0.03

P(trackingdismount) 0.21 0.09 0.27

• PERFECT– Dismounts are at granularity of a few pixels which is more than

the difference between consecutive frames – large false alarm rates

– GMM has lower levels of accuracy than RFD– Varying the GMM parameters (e.g. learning rate and # of

Page 22: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

22

ARGUS-ISPublicReleasePoster

Page 23: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration

ReferencesfromPERFECTproject1. EdinKadrik,DavidLakata andAndréDeHon.“ImpactofMemoryArchitectureonFPGAEnergyConsumption”,FPGA’15,

Monterey,California,USA,Feb22-24,2015,http://ic.ese.upenn.edu/pdf/meme_fpga2015.pdf.2. BenjaminGojman,Sirisha Nalmela,Nikil Mehta,NicholasHowarth,andAndréDeHon,“GROK-LAB:GeneratingRealOn-chip

KnowledgeforIntra-clusterDelaysusingTimingExtraction”,ACMTransactionsonReconfigurableTechnologyandSystems(TRETS),Volume7,Number4,DOI:10.1145/2597889,December,2014.

3. E.Kadric,K.Mahajan,A.DeHon,"KungFuDataEnergy—MinimizingCommunicationEnergyinFPGAComputations",IEEESymposiumonField-ProgrammableCustomComputingMachines(FCCM),May11–13,2014

4. E.Kadric,K.Mahajan,A.DeHon,"EnergyReductionthroughDifferentialReliabilityandLightweightChecking",FPGA’14,February26–28,2014,Monterey,California,USA.

5. FengLiu,Soumyadeep Ghosh,NickP.Johnson,DavidI.August.“CGPA:Coarse-GrainedPipelinedAccelerators”,DesignAutomationConference(DAC),201451stACM/EDAC/IEEE,1-5June2014,SanFrancisco,CA,http://dl.acm.org/citation.cfm?doid=2593069.2593105

6. AndreDeHon,“FundamentalUnderpinningsofReconfigurableComputingArchitectures”,acceptedforpublicationintheProceedingsoftheIEEE,Volume103,Issue:3,DOI:10.1109/JPROC.2014.2387696,March2015,http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7086421

7. FengLiu,Heejin Ahn,StephenR.Beard,Taewook Oh,andDavidI.August,“DynaSpAM:DynamicSpatialArchitectureMappingusingOutofOrderInstructionSchedules”,Proceedingsofthe42ndInternationalSymposiumonComputerArchitecture(ISCA),ISBN978-1-4503-3402-0/15/06,Portland,OR,USA,June13-17,2015.

8. EdinKadrik,DavidLakata andAndréDeHon,"ImpactofParallelismandMemoryArchitectureonFPGACommunicationEnergy",ACMTransactionsonReconfigurableTechnologyandSystems,Vol.9,tobepublisheddate:2016.0/15/06,Portland,OR,USA,June13-17,2015.

9. “DemonstrationofApplicabilityofPERFECT/PRACTICETechnologytoTAILWINDProcessing”,BAESystemsTechnicalReportforDARPAPERFECTProgram

10. “DemonstrationofApplicabilityofPERFECT-PRACTICETechnologytoARGUS-ISProcessing”,BAESystemsTechnicalReportforDARPAPERFECTProgram