architecture support for disciplined approximate...
TRANSCRIPT
![Page 1: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/1.jpg)
sa paASPLOS 2012
Hadi EsmaeilzadehAdrian SampsonLuis CezeDoug Burger
Architecture Supportfor DisciplinedApproximate Programming
University of Washington
Microsoft Research
![Page 2: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/2.jpg)
mobile devicesbattery usage
data centerspower & cooling costs
dark siliconutilization wall
![Page 3: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/3.jpg)
Disciplined approximate programming
Precise Approximate✗✓references
jump targets
JPEG header
pixel data
neuron weights
audio samples
video frames
The EnerJ programming language
safely interleave approximate and precise operation
![Page 4: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/4.jpg)
![Page 5: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/5.jpg)
EnergyErrors EnergyErrors
EnergyErrors EnergyErrors
![Page 6: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/6.jpg)
Perfect correctness is not required
information retrieval
machine learning
sensory data
scientific computing
physical simulation
games
augmented reality
computer vision
![Page 7: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/7.jpg)
@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;
Disciplined approximate programmingThe EnerJ programming language
![Page 8: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/8.jpg)
@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;
Disciplined approximate programmingThe EnerJ programming language
approximate data storage
![Page 9: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/9.jpg)
@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;
Disciplined approximate programmingThe EnerJ programming language
approximate operations
![Page 10: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/10.jpg)
Hardware supportfor disciplined approximate programming
TruffleCoreCompiler
EnerJ Code
@Approx float[] nums;⋮@Approx float total = 0.0f;for (@Precise int i = 0; i < nums.length; ++i) total += nums[i];return total / nums.length;
![Page 11: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/11.jpg)
Hardware supportfor disciplined approximate programming
TruffleCoreCompiler
Compiler-directed approximation
Simplify hardware implementation
Safety checks at compile time
No expensive checks at run time
![Page 12: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/12.jpg)
Approximation-aware ISA
Dual-voltage microarchitecture
Energy savings results
Hardware supportfor disciplined approximate programming
![Page 13: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/13.jpg)
Approximation-aware ISA
Dual-voltage microarchitecture
Energy savings results
Hardware supportfor disciplined approximate programming
![Page 14: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/14.jpg)
Approximation-aware languages need:
Approximate operations
Approximate data
Fine-grained interleaving
+-÷×
&|ALU
registers caches main memory
ADD R1 R2 R3MOV R3 R4JMP 0x01234STL R1 0xABCDLDF R2 0xBCDEADD R1 R2 R3MOV R3 R4JMP 0x01234STL R1 0xABCDLDF R2 0xBCDEADD R1 R2 R3MOV R3 R4JMP 0x01234
![Page 15: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/15.jpg)
Approximation-aware languages need:
Approximate operations
Approximate data
+-÷×
&|ALU
registers caches main memory
per instruction
per cache line
![Page 16: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/16.jpg)
Traditional, precise semantics
ADD r1 r2 r3:
writes the sum of r1 and r2 to r3some value
![Page 17: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/17.jpg)
Approximate semantics
ADD r1 r2 r3:
writes the sum of r1 and r2 to r3some value
Informally: r3 gets something that approximates the sum r1 + r2.Actual error pattern depends on microarchitecture, voltage, process, variation, …
![Page 18: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/18.jpg)
Undefined behavior
ADD r1 r2 r3:
???
![Page 19: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/19.jpg)
Approximate semantics
ADD r1 r2 r3:
writes the sum of r1 and r2 to r3some value
Informally: r3 gets something that approximates the sum r1 + r2.
No other register is modified.
Does not jump to an arbitrary address.No floating point division exception is raised.
No missiles are launched.⋮
![Page 20: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/20.jpg)
An ISA extensionwith approximate semantics
operationsADD.aMUL.aCMPLE.a
AND.aXNOR.aSRL.a
ADDF.aDIVF.a…ALU
storageregisterscaches
main memory
LDL.aSTL.a STF.a
LDF.a …
![Page 21: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/21.jpg)
Dual-voltage pipeline
Fetch Decode Reg Read Execute Memory Write Back
Branch Predictor
Instruction Cache
ITLB
Decoder Register File
Integer FU
FP FU
Data Cache
DTLB
Register File
data movement & processing planecontrol plane
![Page 22: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/22.jpg)
Dual-voltage pipeline
Register File
Integer FU
FP FU
Data Cache
![Page 23: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/23.jpg)
Dual-voltage pipeline
Register File
Integer FU
FP FU
Data Cache
Integer FU
FP FU
switch replicate switch(dynamic) (dynamic)(static)
![Page 24: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/24.jpg)
Dual-voltage functional units:shadow structures
ExecuteStage
operands result
One structure isactive at a time.
![Page 25: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/25.jpg)
Dual-voltage functional units:shadow structures
Issue width not changed(scheduler is unaware of shadowing)
Inactive unit is power-gated
No voltage change latency
![Page 26: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/26.jpg)
Approximate storage:register modes
r1
r2
r3
r4
r5
r6
r7
r8
⋮
r4
precise modeapproximate mode
Reads from registersin approximate modemay return any value.
![Page 27: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/27.jpg)
Approximate storage:register modes
r1
r2
r3
r4
r5
r6
r7
r8
⋮
ADD r1 r2 r3
![Page 28: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/28.jpg)
Approximate storage:register modes
r1
r2
r3
r4
r5
r6
r7
r8
⋮
ADD.a r1 r2 r3r3
The destination register’smode is set to match thewriting instruction.
![Page 29: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/29.jpg)
Approximate storage:register modes
r1
r2
r3
r4
r5
r6
r7
r8
⋮
r3
r4ADD r2 r3.a r4
Register operandsmust be marked withthe register’s mode.(Otherwise, read garbage.)
![Page 30: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/30.jpg)
Registers and caches:dual-voltage SRAMs
001110101101
precisioncolumn
dataVDDH VDDL
row selectiondata (read)+ data (write)
+ precision
DV-SRAM subarray
(for sense amplifiers and
precharge)
![Page 31: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/31.jpg)
Registers and caches:dual-voltage SRAMs
Mixture of precise and approximate data
Instruction stream gives access levels(compiler-specified)
![Page 32: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/32.jpg)
Approximate storage:caches
r1
r2
r3
r4
r5
r6
r7
r8
⋮
LDL.a 0x…
r3
r4
r1
Cache
Data enters cache with precision of the access.Compiler: consistently treat data as approximate or precise.(Otherwise, read garbage.)
![Page 33: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/33.jpg)
Approximate main memory
Detailed DV-SRAM design
Voltage level-shifter and mux circuits
Replicated pipeline registers
Broadcast network details
Also in the paper
0-VddHoutput
VddH VddH
VddL
input
0-VddHprecision 0 -Vdd(H/L)
VddH
VddL
0 -VddLoutput
VddH
0-VddHprecision
VddH
input
0 -Vdd(H/L)VddH
0 -Vdd(H/L)input[0]
0 -Vdd(H/L)input[1]
0 -Vdd(H/L)output
0-VddHselect
![Page 34: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/34.jpg)
Approximation-aware ISA
Dual-voltage microarchitecture
Energy savings results
Hardware supportfor disciplined approximate programming
![Page 35: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/35.jpg)
Energy savings results
Simulated EnerJ programsPrecision-annotated Java [PLDI’11]Scientific kernels, mobile app, game engine, imaging, raytracer
Modified McPAT models for OoO (Alpha 21264) and in-order cores[Li, Ahn, Strong, Brockman, Tullsen, Jouppi; MICRO’09]65 nm process, 1666 MHz, 1.5 V nominal (VDDH)4-wide (OoO) and 2-wide (in-order)Includes overhead of additional muxing, shadow FUs, etc.
Extended CACTI for DV-SRAM structures[Muralimanohar, Balasubramonian, and Jouppi; MICRO’07]64 KB (OoO) and 32 KB (in-order) L1 cacheLine size: 16 bytesIncludes precision column overhead
![Page 36: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/36.jpg)
Energy savings on in-order core
7–24% energy saved on averageRaytracer saves 14–43% energy
-10%
0%
10%
20%
30%
40%
50%
fft imagefill jmeint lu mc raytracer smm sor zxing average
ener
gy re
duct
ion
over
non
-Tru
ffle
0.75 V 0.94 V 1.13 V 1.31 VVDDL =
![Page 37: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/37.jpg)
Energy savings on OoO core
Energy savings up to 17%Efficiency loss up to 5% in the worst case
-10%
0%
10%
20%
30%
40%
50%
fft imagefill jmeint lu mc raytracer smm sor zxing average
ener
gy re
duct
ion
over
non
-Tru
ffle
0.75 V 0.94 V 1.13 V 1.31 VVDDL =
![Page 38: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/38.jpg)
Application accuracy trade-off
fft imagefill jmeint lu mc raytracer smm sor zxing
0%
20%
40%
60%
80%
100%
outp
ut q
uality
-of-s
ervic
e lo
ss
10-8 10-7 10-6 10-5 10-4 10-3 10-2
Application-specific output quality metricsError resilience varies across applications
![Page 39: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/39.jpg)
Hardware support fordisciplined approximate programming
TruffleCoreCompiler
int p = 5;@Approx int a = 7;for (int x = 0..) {
a += func(2);@Approx int z;z = p * 2;p += 4;
}a /= 9;func2(p);a += func(2);@Approx int y;z = p * 22 + z;p += 10;
VDDH
VDDL
![Page 40: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/40.jpg)
Hardware support fordisciplined approximate programming
Approximation-aware ISATightly coupled with language-level precision information
Dual-voltage microarchitectureData plane can run at lower voltageLow-complexity design relying on compiler support
Significant energy savingsUp to 43% vs. a baseline in-order core
![Page 41: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/41.jpg)
Future work ondisciplined approximate programming
Approximate accelerators
Precision-aware programmer tools
Non-voltage approximation techniques
![Page 42: Architecture Support for Disciplined Approximate Programmingasampson/media/truffle-asplos-slides.pdf · Architecture Support for Disciplined Approximate Programming University of](https://reader034.vdocument.in/reader034/viewer/2022052611/5f09586a7e708231d4266450/html5/thumbnails/42.jpg)