perfect case studies demonstrating order of magnitude ...dkw/papers/perfect... · automation...
TRANSCRIPT
![Page 1: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/1.jpg)
PERFECTCaseStudiesDemonstratingOrderof
MagnitudeReductioninPowerConsumption
DavidK.Wittenberg,Edin Kadric,AndreDeHon,JonathanEdwards,JeffreySmith,andSilviu Chiricescu
![Page 2: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/2.jpg)
TheProblem– WideAreaMotionImaging(WAMI)
• Wantreal-timedata• Highresolution- 368Cellphonecameras• 1.8Gpixels @10Hz• Airborne• Limitedbandwidthtoground• Limitedpower
![Page 3: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/3.jpg)
3
ARGUS-ISSystemComponents
ARGUS-ISsystemcomponents.Left:sensorin25”gimbal;Right:CFPAwith925MPixelFPAs.
ARGUS-ISruggedizedairborneprocessor
![Page 4: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/4.jpg)
OurSolution
• LowerVoltage• Light-WeightChecks• ExploreParallelism• ContinuousHierarchyMemory
![Page 5: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/5.jpg)
ARGUS-IScontextdrivesPERFECTenergyimprovements
©BAESystems2013
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
IMEM
RF
EXU
P Ctrl
DMEM
M M
MM Network Switch
Network Switch
Bulk Memory Bank
Network Switch
Bulk Memory Bank
Network Switch
Bulk Memory Bank
Network Switch
Bulk Memory Bank
Network Switch
Bulk Memory Bank
Network Switch
Bulk Memory Bank
Network Switch
Bulk Memory Bank
Network Switch
Bulk Memory Bank
IMEM
RF
EXU
DMEM
IMEM
RF
EXU
DMEM
Ne
two
rk S
witc
h
ArrayDiag. &Repair
ArrayDiag. &Repair
Ne
two
rk S
witc
h
Ne
two
rk S
witc
hN
etw
ork
Sw
itch
IMEM
RF
EXU
DMEM
IMEM
RF
EXU
DMEM
Off−chipMemoryInterface
Reliable Coresand Infrastructure
Reliable Coresand InfrastructureLow Energy Array of Reconfigurable Nodes
Debayer
Lucas-
Kanade
Gaussia
nMixture
Mod
el
MeasureenergyonthePRACTICELEARNWAMIimplementation
MeasureenergyontheARGUS-ISWAMIpipelinecomponents
Rawimage Demosaicing Registration MotionDetectionandTracking FinalImage
4
![Page 6: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/6.jpg)
6
Demosaicing Registration MotionDetectionandTracking
Debayer LK GMM WAMI
PEs 1 4 8 (4,4,4)
Ops/Cycle20 220 680 360
Area(cm2)0.0046 0.33 0.2 0.55
0.7VVdd Throughput(fps)540 150 1600 150
Efficiency(Gops/W)740 280 910 340
0.5VVdd Throughput(fps)110 30 320 50
Efficiency(Gops/W) 1500 560 1800 680
512×512Image
ParallelOptimizedDesign
![Page 7: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/7.jpg)
ContinuousHierarchyMemory
Data
Addr[n-1]
Addr[n-2]Addr[n-3:0]
Lmseg(M/4)
Lmseg(M
/4)
Lmseg(M)
Lmseg(M
)
![Page 8: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/8.jpg)
64k 128k 64k-CHM 128k-CHM 64k-CHM-S128k-CHM-S0
0.02
0.04
0.06
0.08
0.1
0.12
Lucas-Kanade Energy/Frame
memroutelogicclk
Memory Size
Ene
rgy(
J)
![Page 9: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/9.jpg)
LightWeightChecks
• Example:Sortingvs.checkingthatitissorted• Notneededinmanyimagingalgorithms,astheyiterateuntilerrorsaresmallenough
![Page 10: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/10.jpg)
Differentialvs.SingleVoltage(22nmexample)
10
Vdd Vdd
![Page 11: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/11.jpg)
11
PERFECTResultsandComparisonwithARGUS
Technology Power (W) Time (s) Energy (J) GOps GOPS / WARGUS-ISGPU 77.4 6.5 507.2 940.6 1.9PERFECT 0.5 17.1 0.9 4096.6 439.9Ratio 0.01 2.6 0.002 4.4 236.9
• PERFECT Simulated Output for WAMI Pipeline– Scaled 512x512 pixel 5 Mpixel and further scaled to achieve 3000 frame run
• 7 nm process technology assumed as best-case– Varying Vdd and Mem-size led to optimized GOPS / W
• Vdd = 0.7, Mem size = 64k • Key Results:
– 237X GOPS/W increase even with memory intensive pipeline– Performed 4X more operations due to differences in implementation
Comparison of ARGUS-IS GPU and PERFECT output
![Page 12: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/12.jpg)
TAILWINDSYSTEM PERFECTSYSTEM
ImageRegistration(SLAM)
MotionDetection(RFD)
ImageRegistration(LK)
MotionDetection(GMM)
STD CHM STD CHM
Energy/Frame(mJ) 752 237 175 129 19 14
Time/Frame(ms) 253 83 444 37
TAILWIND:PERFECTgives6-15XImprovement
• Motion Detection – PERFECT kernel consumes ≈ 15X less energy than the TAILWIND kernel• Image Registration – PERFECT kernel consumes ≈ 6X less energy than the TAILWIND kernel
![Page 13: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/13.jpg)
1.ExceededPERFECTenergyefficiencygoalsbyupto10x(440-680Gops/W)oncommonsetofWideAreaMotionImagery(WAMI)kernels
2.TheprojectedArguspowerconsumptionreductionenables:• Airborne,real-time3Dsituationalawarenesstothewarfighteralongwith:– Miniaturizationof3Dmappingtechnologycurrentlyhousedinmultipleserverracks
– Increasedtheflightmissiontimeandtrackdetectionaccuracywithimproveddatacompressioninmannedflights
• Autonomousthreatdetection,trackingandmulti-sensorfusion3.DesignedanovelFPGAbasedarchitecturethatcombinesultralowpoweroperationwithhighreliability• Optimizethesubstrateanddesignmappingtominimizecommunicationenergy
Summary
![Page 14: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/14.jpg)
AcronymsUsed
ARGUS AutonomousReal-timeGround-UbiquitousSurveillance-ImagingSystemCHM ContinuousHierarchyMemoryFIT FailuresInTimeFPGA FieldProgrammableGateArrayGMM GaussianMixtureModelLEARN Low-EnergyArchitectureofReconfigurableNodesL-K Lucas-KanadeLWC LightWeightChecksPERFECT PowerEfficiencyRevolutionforEmbeddedComputingTechnologyRANSAC RANdom SAmple ConsensusRFD RobustFrameDifferencesSLAM SimultaneousLocationAndMappingTAILWIND TacticalAircrafttoIncreaseLongWaveInfraredNighttimeDetectionUAV UnmannedAerialVehicleWAMI WideAreaMotionImagery
![Page 15: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/15.jpg)
BACKUP
![Page 16: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/16.jpg)
MinimalStreamingDesign
16
FullyfunctionalpipelineDemosaicing Registration MotionDetectionandTracking
Debayer LK GMM WAMI
PEs 1 1 1 (1,1,1)
Ops/Cycle20 55 85 90
Area(cm2)0.0046 0.13 0.034 0.17
0.7VVdd
Throughput(fps)540 58 430 57
Efficiency(Gops/W)740 250 650 300
0.5VVdd
Throughput(fps)110 12 85 11
Efficiency(Gops/W) 1500 500 1300 600
512×512Image
![Page 17: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/17.jpg)
WAMIonFPGA
17
![Page 18: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/18.jpg)
OptimizingParallelism
18
LDeBayer LK GMM
512×512Image
![Page 19: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/19.jpg)
64k 128k 64k-CHM 128k-CHM 64k-CHM-S128k-CHM-S0
0.005
0.01
0.015
0.02
0.025
GMM Energy/Frame
memroutelogicclk
Memory Size
Ene
rgy
(J)
![Page 20: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/20.jpg)
Energyreductiontechniques*
20
• Uselow-powertechnology• Voltagescaling• UseT-gatesorgate-boosting(GB)insteadofpass-Tw/levelrestorer• Usedual-Vdd• Powergating
*Edin Kadric “Energyreductionthroughvoltagescalingandlightweightchecking”,PhD.Thesis.
![Page 21: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/21.jpg)
• TAILWIND – Fine-tuned
parameters for the different kernels19
TAILWINDSystemLevelMetricsDataSet SystemLevelMetric SLAM +RFD LK+RFD SLAM+GMM
QuanticoScene1 P(vehicle detection) 0.9 0.53 0.5
P(dismountdetection) 0.64 0.38 0.46
Falsealarm/minute 1.35 103 163
P(tracking vehicle) 0.3 0.1 0
P(trackingdismount) 0.33 0.14 0.14
QuanticoScene2 P(vehicledetection) 0.82 0.89 0.3
P(dismountdetection) 0.52 0.71 0.25
Falsealarm/minute 8.9 149 124
P(tracking vehicle) 0.42 0.25 0
P(trackingdismount) 0.44 0.28 0.1
QuanticoScene3 P(vehicledetection) 0.85 0.54 0.24
P(dismountdetection) 0.6 0.42 0.19
Falsealarm/minute 4.1 117 126
P(tracking vehicle) 0.15 0.13 0.03
P(trackingdismount) 0.21 0.09 0.27
• PERFECT– Dismounts are at granularity of a few pixels which is more than
the difference between consecutive frames – large false alarm rates
– GMM has lower levels of accuracy than RFD– Varying the GMM parameters (e.g. learning rate and # of
![Page 22: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/22.jpg)
22
ARGUS-ISPublicReleasePoster
![Page 23: PERFECT Case Studies Demonstrating Order of Magnitude ...dkw/papers/PERFECT... · Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, 1-5 June 2014, San Francisco, CA, ... 10.“Demonstration](https://reader034.vdocument.in/reader034/viewer/2022052003/6015c526e1278542ce17ebb4/html5/thumbnails/23.jpg)
ReferencesfromPERFECTproject1. EdinKadrik,DavidLakata andAndréDeHon.“ImpactofMemoryArchitectureonFPGAEnergyConsumption”,FPGA’15,
Monterey,California,USA,Feb22-24,2015,http://ic.ese.upenn.edu/pdf/meme_fpga2015.pdf.2. BenjaminGojman,Sirisha Nalmela,Nikil Mehta,NicholasHowarth,andAndréDeHon,“GROK-LAB:GeneratingRealOn-chip
KnowledgeforIntra-clusterDelaysusingTimingExtraction”,ACMTransactionsonReconfigurableTechnologyandSystems(TRETS),Volume7,Number4,DOI:10.1145/2597889,December,2014.
3. E.Kadric,K.Mahajan,A.DeHon,"KungFuDataEnergy—MinimizingCommunicationEnergyinFPGAComputations",IEEESymposiumonField-ProgrammableCustomComputingMachines(FCCM),May11–13,2014
4. E.Kadric,K.Mahajan,A.DeHon,"EnergyReductionthroughDifferentialReliabilityandLightweightChecking",FPGA’14,February26–28,2014,Monterey,California,USA.
5. FengLiu,Soumyadeep Ghosh,NickP.Johnson,DavidI.August.“CGPA:Coarse-GrainedPipelinedAccelerators”,DesignAutomationConference(DAC),201451stACM/EDAC/IEEE,1-5June2014,SanFrancisco,CA,http://dl.acm.org/citation.cfm?doid=2593069.2593105
6. AndreDeHon,“FundamentalUnderpinningsofReconfigurableComputingArchitectures”,acceptedforpublicationintheProceedingsoftheIEEE,Volume103,Issue:3,DOI:10.1109/JPROC.2014.2387696,March2015,http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7086421
7. FengLiu,Heejin Ahn,StephenR.Beard,Taewook Oh,andDavidI.August,“DynaSpAM:DynamicSpatialArchitectureMappingusingOutofOrderInstructionSchedules”,Proceedingsofthe42ndInternationalSymposiumonComputerArchitecture(ISCA),ISBN978-1-4503-3402-0/15/06,Portland,OR,USA,June13-17,2015.
8. EdinKadrik,DavidLakata andAndréDeHon,"ImpactofParallelismandMemoryArchitectureonFPGACommunicationEnergy",ACMTransactionsonReconfigurableTechnologyandSystems,Vol.9,tobepublisheddate:2016.0/15/06,Portland,OR,USA,June13-17,2015.
9. “DemonstrationofApplicabilityofPERFECT/PRACTICETechnologytoTAILWINDProcessing”,BAESystemsTechnicalReportforDARPAPERFECTProgram
10. “DemonstrationofApplicabilityofPERFECT-PRACTICETechnologytoARGUS-ISProcessing”,BAESystemsTechnicalReportforDARPAPERFECTProgram