automatic kernel code generation for focal-plane sensor
TRANSCRIPT
![Page 1: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/1.jpg)
1
Automatic Kernel Code Generation for Focal-plane Sensor-Processor DevicesThomas Debrunner - MSc Student Imperial College LondonPaul Kelly - Software Performance Optimisation Group Lead, Imperial College LondonSajad Saeedi – Research Fellow, Imperial College London
![Page 2: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/2.jpg)
2
With kind support from Piotr Dudek and his team at Manchester University
This work is part of the EPSRC “PAMELA” Project
![Page 3: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/3.jpg)
Cameras produce images for humans, not machines
3
![Page 4: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/4.jpg)
http://personalpages.manchester.ac.uk/staff/p.dudek/papers/carey-cnna2012.pdf
SCAMP 5 focal-plane sensor processor
• 256x256 SIMD processor array
• Light sensor on every processor
• Ca.170 transistors per processor
Piotr Dudek and colleagues at Manchester University
4
![Page 5: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/5.jpg)
http://personalpages.manchester.ac.uk/staff/p.dudek/papers/carey-cnna2012.pdf
Piotr Dudek and colleagues at Manchester University
5
SCAMP 5 focal-plane sensor processor
• Seven registers holding analogue values
• Computation by moving charge
• Addition is easy
• No multiply
• North-east-west-south data movement
![Page 6: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/6.jpg)
Basic instruction set (of interest)
• Shift image x• Shift image y• Add two images• Subtract two images• Scale image by 1/2• Take absolute value of image
6
![Page 7: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/7.jpg)
This talk• How to do convolution filters on SCAMP 5?• For image filtering• As a component in image processing algorithms
• Notably CNNs
• Potential • low power• Extreme effective frame rate
• Example: Viola-Jones face detection• A compiler: general code generator producing highly-
optimised convolution implementations7
![Page 8: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/8.jpg)
010203040506070
Gauss3 Box7 Sobel
Filter time [μs]
CPU GPU CPACPU: INTEL i7-6700, GPU: NVIDIA TITAN X, CPA: SCAMP-5c estimate
![Page 9: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/9.jpg)
We can add repeatedly – so we can multiply by a constant
9
Convolution filters on SCAMP 5Easy filters
![Page 10: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/10.jpg)
10
Convolution filters on SCAMP 5Harder filters
![Page 11: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/11.jpg)
We can divide by two repeatedly
11
Convolution filters on SCAMP 5Harder filters – still easy
![Page 12: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/12.jpg)
12
Convolution filters on SCAMP 5Hard filters
![Page 13: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/13.jpg)
We can approximate
13
Convolution filters on SCAMP 5Hard filters – easy again
![Page 14: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/14.jpg)
14
We can approximate
![Page 15: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/15.jpg)
Filters often have repeated terms
We implement multiplication using summations – so there are lots of common subterms
We can shift intermediate values to save redundant computation
15
![Page 16: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/16.jpg)
Simple motivating (extreme) example 5x5 Box:
16
+ +
++# (1)
+
++
(1) # (1)
++ +
+
(1)
" (2)
+
! (2)
![Page 17: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/17.jpg)
“Final Set” (FS) of Partial Value Representatives (PVR)The set of summands we need for the result of the filter application
Finding a plan: End point
17
![Page 18: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/18.jpg)
“Initial Set” (IS)
The set of summands of a fresh image
Finding a plan: Starting point
18
![Page 19: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/19.jpg)
Find a sequence of operations to transform IS into FS
Objective
19
(Identity filter)
(desired filter)
![Page 20: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/20.jpg)
Instructions as transformationsShifts:
20
(0 0) (1 -1)(2 4) (3 3)
→(1) ↓(1)
![Page 21: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/21.jpg)
Instructions as transformationsScales (Div2):
21
(0 0)(0 0)
+(1) (0 0)
![Page 22: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/22.jpg)
Instructions as transformationsAdditions / Subtractions:
22
(0 1)
+
(0 1)(1 2)
(1 2)
+
![Page 23: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/23.jpg)
Reverse SplitFS
A B R
A, B transformableRecursive, continue with B, R
IS
![Page 24: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/24.jpg)
We prune splits that would exceed the number of registers in the SCAMP 5 device (seven)
We prune subtrees when the resulting instruction sequence is longer than the best so far
We attempt heuristically-promising splits first
24
Reverse SplitPruning
![Page 25: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/25.jpg)
25
node1 = east(node0)node2 = west(node1)node2 = west(node2)node4 = west(node1)node4 = div2(node4)node3 = add(node2,node1)node6 = add(node3, node4)
Example
![Page 26: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/26.jpg)
Apply a systematic retiming to minimize shifts26
Graph Relaxation
![Page 27: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/27.jpg)
0
1 2
3
4
5
B = west(A)C = div2(A)B = add(C, B)A = east(A)A = add(B, A)
Final resulting code:
27
Register Allocation
![Page 28: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/28.jpg)
Full exhaustive search, compared to heuristic search on Sobel 3�3 filter (sampled over 256 runs) 28
Evaluation
![Page 29: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/29.jpg)
• SCAMP 5: estimated based on 10MHz clock rate
• 8 common filter examples on 256�256 8-bit grayscale image
• CPU and GPU: default implementations shipped with OpenCV 3.3.0, with TPP and IPP enabled and with CUDA V8.0.61
• Power estimated based on TDP and time
29
Evaluation
![Page 30: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/30.jpg)
7 StageViola-JonesFaceDetector
• Due to code size and other limitations, we were only able to run a 7-stage Viola-Jones face detector
• It works as well as a 7-stage CPU implementation• But for full accuracy, 25 stages are needed. SCAMP 5 would be
slower than CPUs, but uses much less energy 30
![Page 31: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/31.jpg)
Convolution filters are a key capabilityWith a suitable code generator we can do a lot with very very simple hardwareBy trading approximation against efficiency we can do even more
Near-camera processing is the only way we can approach biological levels of energy efficiencyThere is a spectrum of design choices:
How much to do in analogueWhere to convert to digitalHow compute is distributed and connected to the sensorsHow to preprocess to reduce larger-scale data movement
31
Conclusions
![Page 32: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/32.jpg)
32
Backup
![Page 33: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/33.jpg)
Reverse SplitFS
A B R
A, B transformable
![Page 34: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/34.jpg)
34
Example
![Page 35: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/35.jpg)
FS(-1 0)(-1 0)( 0 0)( 1 0)( 1 0)
(1 0)(1 0)
(-1 0)(-1 0)
(0 0)
A
B
R
![Page 36: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/36.jpg)
FS(-1 0)(-1 0)( 0 0)( 1 0)( 1 0)
(1 0)(1 0)
(-1 0)(-1 0)
(0 0)
A
B
R
→(2)+
+
+
![Page 37: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/37.jpg)
(-1 0)(-1 0)
(0 0)
B
R
![Page 38: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/38.jpg)
FS
A B R
A B R1 R2
FS
A B R
A B2RB1
FS
A B R
A BR1 R2
![Page 39: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/39.jpg)
(-1 0)(-1 0)
(0 0)
B
R
(-1 0)(-1 0)
(0 0)
B
A
→(1) +(1)
![Page 40: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/40.jpg)
(-1 0)(-1 0)
B
![Page 41: Automatic Kernel Code Generation for Focal-plane Sensor](https://reader030.vdocument.in/reader030/viewer/2022020621/61e9cc53ca4d1e461548f6f8/html5/thumbnails/41.jpg)
(-1 0)(-1 0)
BIG(0 0)(0 0)
←(1)