the embedded learning library - tinyml
TRANSCRIPT
![Page 1: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/1.jpg)
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
The Embedded Learning Library
![Page 2: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/2.jpg)
The Embedded Learning Library (ELL)
Cross-compiler for AI pipelines, specialized for resource constrained target platforms
https://github.com/Microsoft/ELL
AI
Pipeline
Target
Machine
Code
ELL
![Page 3: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/3.jpg)
• 3 years at Microsoft Research
• compiler toolchain, tutorials, model gallery
• focus: ARM CPUs embedded GPUs, vision on ARM Cortex A53, keyword spotting on ARM Cortex M4f
The Embedded Learning Library
![Page 4: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/4.jpg)
Computation Graph Optimizer
ELL Platform Abstraction Layer
LLVM
Emitter
OpenCL
Emitter
Importer Importer Importers
Importer Importer Target
Profiles
…
Importer Importer ELL Trainers
Target
Dataset Pretrained
Model
LLVM OpenCL BLAS
Architecture
![Page 5: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/5.jpg)
AI compiler vs. AI runtime
• model-specific optimization
• target-specific optimization
• small executable
• portability
• seamless migration from cloud to edge
why AI compiler? why AI runtime?
best of both worlds
just-in-time AI compiler
![Page 6: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/6.jpg)
compression techniques:
• efficient architectures
• pruning
• low precision math and quantization
• low rank matrix approximation
Evaluation
small loss in accuracy large gain in cost
![Page 7: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/7.jpg)
January 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
model Pareto frontier
![Page 8: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/8.jpg)
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search January 2018
![Page 9: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/9.jpg)
February 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
![Page 10: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/10.jpg)
March 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
![Page 11: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/11.jpg)
April 2018
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Architecture search
![Page 12: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/12.jpg)
• variety of convolution kernels
• scheduling
• engineering
Lossless acceleration
![Page 13: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/13.jpg)
January 2019
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Lossless acceleration
![Page 14: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/14.jpg)
February 2019
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Lossless acceleration
![Page 15: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/15.jpg)
March 2019
30
35
40
45
50
55
60
65
70
0 100 200 300 400 500 600 700 800 900 1000
ILSV
RC
2012 t
op
-1
ms/image on RPi3@700MHz
Lossless acceleration .
![Page 16: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/16.jpg)
mix and match compression techniques
engineering/ML co-design
during training vs post processing
Lossy Acceleration
![Page 17: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/17.jpg)
bit value
0 0
1 1
bit value
0 -1
1 1
bits value
00 0
01 1
10 n/a
11 -1
bits value
0…k [0...2^k - 1]
bits value
0…k [-2^(b-1)-1...2^(b-1)-1]
bits Value
0…k lookup
bits value
0…k a±b±c±.. ±n
Quantization semantics binary
ternary linear
exponential
lookup/clustered iterative sum
![Page 18: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/18.jpg)
b3 b2 b1 b0 a3 a2 a1 a0
d3 d2 d1 d0 c3 c2 c1 c0
d0 c0 b0 a0
d1 c1 b1 a1
d2 c2 b2 a2
d3 c3 b3 a3
bit packed
bit planes
Quantization representation
![Page 19: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/19.jpg)
Quantization example
activations
weights
5 1 7 6 3 4 2 5
1 -1 0 -1 -1 -1 1 0
ternary weights, 3-bit unsigned linear activations (bitplane)
dot = 5*1 + 1*-1 + 7*0 + 6*-1 + 3*-1 + 4*-1 + 2*1 + 5*0 = -7
![Page 20: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/20.jpg)
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
5 1 7 6 3 4 2 5
1 -1 0 -1 -1 -1 1 0
![Page 21: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/21.jpg)
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
![Page 22: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/22.jpg)
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
o = 11101001 && 11011110 = 11001000
absSum += popcount(o) = 3
o = 1100100 && 01011100 = 10000100
negSum += popcount(o) = 2
absSum: o = a && m
absSum += popcount(o)
negSum: o = a && s
negSum += popcount(o)
![Page 23: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/23.jpg)
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
o = 00111010 && 11011110 = 00011010
absSum += popcount(o) = 3 + 2*3 = 9
o = 00011010 && 01011100 = 00011000
negSum += popcount(o) = 2 + 2*2 = 6
absSum: o = a && m
absSum += popcount(o) << 1
negSum: o = a && s
negSum += popcount(o) << 1
![Page 24: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/24.jpg)
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
absSum: o = a && m
absSum += popcount(o) << 2
negSum: o = a && s
negSum += popcount(o) << 2
total = absSum – 2 * negSum
o = 10110101 && 11011110 = 11001000
absSum += popcount(o) = 9 + 4 * 3 = 21
o = 11001000 && 01011100 = 01001000
negSum += popcount(o) = 6 + 4 * 2 = 14
total = 21 – 2 * 14 = -7
![Page 25: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/25.jpg)
Quantization example
1 1 1 0 1 0 0 1
0 0 1 1 1 0 1 0
1 0 1 1 0 1 0 1
0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 0
activations
sign
magnitude
instruction_count = 8 instructions * 3 bits = 24 instructions
vector size = 8
instructions per element = 24 / 8 = 3
if word is 128-bit (NEON):
instruction_count = 8 instructions * 3 bits + 0.3 reduce ops = 24.3 instructions
vector size = 128
instructions per element = 24.3 / 128 = 0.19 (5x faster than float)
![Page 26: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/26.jpg)
Quantization performance
0
5
10
15
20
25
quantize
d v
s fu
ll p
reci
sio
n Speedup on ARM1176
1 Bit 2 Bits 3 bits 8 bits
![Page 27: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/27.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7acc
ura
cy v
s o
rig
inal m
od
el
proportion of zeros in ternary weights
model with
binary weights models with
trinarized
weights
Quantized weight accuracy
![Page 28: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/28.jpg)
Quantized activation accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8
acc
ura
cy v
s re
al act
ivatio
ns
quantized activation bit count
ternary weights
binary weights
![Page 29: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/29.jpg)
• post-training lossy compression (pruning and quantization)
• engineering/ML training co-design
• infrastructure:
beating BLAS on embedded platforms
extending platform abstraction layer to embedded GPUs
global optimizer
Current focus areas
![Page 30: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/30.jpg)
Questions?
• https://microsoft.github.io/ELL/
• Code: https://github.com/Microsoft/ELL
• Model Gallery: https://microsoft.github.io/ELL/gallery/
![Page 31: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/31.jpg)
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
![Page 32: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/32.jpg)
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
![Page 33: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/33.jpg)
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
![Page 34: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/34.jpg)
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13
![Page 35: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/35.jpg)
Not every model is a winner
![Page 36: The Embedded Learning Library - tinyML](https://reader034.vdocument.in/reader034/viewer/2022042721/62679fb4ad507076f9763587/html5/thumbnails/36.jpg)
Lig
ht B
lue
R0 G
188 B
242
Gre
en
R16 G
124 B
16
Red
R232 G
17 B
35
Mag
en
taR
180 G
0 B
158
Pu
rple
R92 G
45 B
145
Blu
eR
0 G
120 B
212
Teal
R0 G
130 B
114
Yello
wR
255 G
185 B
0
Ora
ng
eR
216 G
59 B
1
Lig
ht Y
ello
wR
255 G
241 B
0Lig
ht O
ran
ge
R255 G
140 B
0Lig
ht M
ag
en
taR
227 G
0 B
140
Lig
ht P
urp
leR
180 G
160 B
255
Lig
ht T
eal
R0 G
178 B
148
Lig
ht G
reen
R186 G
216 B
10
Dark
Red
R168 G
0 B
0D
ark
Mag
en
ta
R92 G
0 B
92
Dark
Pu
rple
R50 G
20 B
90
Mid
Blu
eR
0 G
24 B
143
Dark
Teal
R0 G
75 B
80
Dark
Gre
en
R0 G
75 B
28
Dark
Blu
eR
0 G
32 B
80
Mid
Gra
yR
115 G
115 B
115
Dark
Gra
yR
80 G
80 B
80
Ric
h B
lack
R0 G
0 B
0
Wh
iteR
255 G
255 B
255
Gra
yR
210 G
210 B
210
Lig
ht G
ray
R230 G
230 B
230
So
ft Bla
ck
for T
ext
R26 G
26 B
26
So
ft Bla
ck
for B
ackg
rou
nd
sR
13 G
130 B
13