sejong - on-device ai for mobile and consumer devices: from...
TRANSCRIPT
![Page 1: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/1.jpg)
On-Device AI for Mobile and Consumer Devices:From DNN Model Compression to Domain-Specific Accelerators
Daehyun Kim
![Page 3: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/3.jpg)
Robots
JetBot 90 AI+ Bot Care & Bot Handy
![Page 4: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/4.jpg)
TVs
Neo Quantum Processor Smart Trainer
![Page 5: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/5.jpg)
Home Appliances
SmartThings Cooking Family Hub
![Page 6: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/6.jpg)
Mobile Phones
Single Take Camera Bixby Vision
![Page 7: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/7.jpg)
Our AI Vision
![Page 8: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/8.jpg)
Why On-device AI ?
![Page 9: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/9.jpg)
Two ways to implement AI : Cloud vs. On-device
On-device AI can enhance cloud AI in user experience
Centralized AI On-device AI
Privacy Protection
Reliable AI
Context-aware
Fast Response
![Page 10: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/10.jpg)
Enhancing User Experience : Privacy
Important Information about users can be processed on device
“Read themost recent
message”
Device
Voice Assistant
HW Sensor
Microphone (Voice)
No need to
send the voice to cloud
send ‘most recent message’ to cloud
send private information to cloud
![Page 11: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/11.jpg)
Enhancing User Experience : Reliable AI
No matter what happens on network connection or servers, it works!
In elevator In flight Server error
404Page not found
![Page 12: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/12.jpg)
Enhancing User Experience : Context-aware
Device knows you fairly well AI can suggest the best option
1 2 4
Launch Select Select Start to Record- Camera - More - Super Slow Motion
Recommendation
You can use this with“Record a Super slowmotion video”
3
![Page 13: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/13.jpg)
Enhancing User Experience : Fast response
Without latency in network and servers, it executes considerably quickly
No Network Latency
No Server Latency
![Page 14: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/14.jpg)
On-device AI Research
![Page 15: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/15.jpg)
On-device AI Challenges
On-device AI is required to run on limited resources, compared to cloud
Resource GapClouds Devices
Computing
Memory
Power
![Page 16: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/16.jpg)
Our On-Device AI Approach
① Neural Network Model Optimization
② AI System Software
③ AI HW Accelerator
NN Compiler/RuntimeModel Analysis, Scheduling, Memory Optimization, …
NN PipelineMulti-Model Pipeline, Multi-Device Pipeline
Neural ProcessorPower Efficient Specialized Accelerator HW IP
AI+
SW+
HW
Co-Design
Model CompressionPruning, Quantization, …
Model Architecture
NAS, Multi-taking, …
NN TrainingOn-Device Training
FleXORBiQGEMM
nnStreamernnTrainer
SNP
![Page 17: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/17.jpg)
FleXOR
FleXOR: Trainable Fractional Quantization
D. Lee, S Kwon, B. Kim, Y Jeon, B Park, J YunNeurips 2020
![Page 18: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/18.jpg)
FleXOR
• A flexible encryption algorithm/architecture (called “FleXOR”)
to enable fractional sub 1-bit numbers to represent each weight
• Contributions
– XOR-based encryption of quantized bits enhances compression ratio
– XOR-aware training algorithm learns encrypted weights
– High model accuracy with sub 1-bit quantization
![Page 19: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/19.jpg)
Differentiable XOR-gate Network
𝑤𝑖𝑏
𝑏𝑖
𝛼
𝑤𝑖 sign
Existing STE (straight-through estimator)
𝑤𝑖𝑏
𝑏𝑖
𝛼
Our FleXOR
𝑤𝑘𝑒
𝑤𝑙𝑒
sign
sign
𝜕tanh
𝜕tanh
1 ≤ 𝑘, 𝑙 ≤ 𝑁𝑖𝑛 1 ≤ 𝑖 ≤ 𝑁𝑜𝑢𝑡
FleXOR should be able to select the best out of 2^Nin possible outputs that are randomly selected from larger 2^Nout
search space.
𝑦 = 𝑡𝑎𝑛ℎ 𝑥1 × −𝑡𝑎𝑛ℎ(𝑥2) 𝑦 = 𝑡𝑎𝑛ℎ 𝟏𝟎𝑥1 × −𝑡𝑎𝑛ℎ(𝟏𝟎𝑥2)
![Page 20: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/20.jpg)
Experiments
- FleXOR allows reduced memory footprint and bandwidth which are critical for energy-efficient inference designs.
- Even though achieving the best accuracy for 1.0 bit/weight is not the main purpose(e.g., XOR gate may be redundant for Nin=Nout), FleXOR shows the minimum accuracy drop.
![Page 21: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/21.jpg)
BiQGEMM: Matrix Multiplication with Lookup Table for Binary-Coding-based Quantized DNNs
Y. Jeon, B. Park, S. Kwon, B. Kim, J. Yun, D. LeeSC20
BiQGEMM
![Page 22: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/22.jpg)
• Previous Binary-Code Quantization Implementation
• Special H/W for MATMUL operation with Quantized weights
- For non-uniform quantization, CPU/GPU/NPU needs to perform on-chip dequantization in practice.
- Binary-code is then only to reduce memory requirements (not latency) without special H/W
• BiQGEMM
• Dedicated binary-coding-based matrix multiplication unit kernel design using lookup tables
• CPUs and GPUs can utilize quantized matrices to improve performance
+1 -1 +1 -1
-1 -1 -1 +1
-1 -1 +1 -1
+1 +1 +1 -1
𝑎00𝑎01𝑎02𝑎03
+1 -1 +1 -1
-1 -1 -1 +1
-1 -1 +1 -1
+1 +1 +1 -1
𝑎00𝑎01𝑎02𝑎03
CPU
Binary weights are
transferred in ‘Byte’ level
CPU needs to access
binary data in ‘Bit’ level
+1 -1 +1 -1
-1 -1 -1 +1
-1 -1 +1 -1
+1 +1 +1 -1
𝑎00𝑎01𝑎02𝑎03
Storage (HDD, SSD …)
1010 0001 0010 1110 …
1 byte
SDRAM
BiQGEMM
![Page 23: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/23.jpg)
}𝑊 𝐵0 𝐵1 𝐵2
𝑎00…𝑎0𝑛
𝑎10…𝑎1𝑛
𝑎20…𝑎2𝑛
≈ + +𝑋 𝑋.{
0.03 0.16 -0.07-0.17 0.21 -0.100.20 -0.09 0.01
-0.17 0.21 -0.10
+1 -1 -1 +1
-1 +1 +1 -1
+1 -1 -1 +1
...
-1 +1 +1 -1
+1 -1 -1 +1
-1 +1 +1 -1
.
0.15
0.30
0.10
...
0.22
0.35
0.08
𝑩𝟎 = {−𝟏,+𝟏}𝟑𝟔×𝟒𝑨𝟎
𝑿 = ℝ𝟒×𝟑
• For MatMul, computations in are redundant
• Computations using 𝑿 and B values are performed in advance
and stored in lookup tables0.03 and -0.17 are repeatedly accessed
BiQGEMM
![Page 24: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/24.jpg)
0.03 0.16 -0.07-0.17 0.21 -0.100.20 -0.09 0.01
-0.17 0.21 -0.10
1 0 0 1
0 1 1 0
1 0 0 1
...
0 1 1 0
1 0 0 1
0 1 1 0
.
0.15
0.30
0.10
...
0.22
0.35
0.08
𝑩𝟎 = {𝟎, 𝟏}𝟑𝟔×𝟒𝑨𝟎
𝑿 = ℝ𝟒×𝟑
• MatMul computations are replaced with pre-computation and Lookup table access
• Lookup-table size is empirically determined.
• In practice, 8 bits are used as an index with 256 entries
• No redundant computations
• Number of float multiplications are greatly reduced
• B matrix is now access in ‘Byte’ level (No bit-level operation)
Lookup Table
0 0 -(0.03) -(-0.17)
0 1 -(0.03) +(-0.17)
1 0 +(0.03) -(-0.17)
1 1 +(0.03) +(-0.17)Retrieved from a Lookup Table
BiQGEMM
** -1 in 𝑩𝟎 is stored as 0 in memory
![Page 25: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/25.jpg)
Experimental Results
• Speedup over Eigen using 1-thread. Matrix size is given as m-by-1K. Output size (m) and batch size are annotated along the horizontal axis.
(a) PC (i7-7700)
(b) Mobile (Coretex-A76)
![Page 26: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/26.jpg)
nnStreamer
Linux Foundation AI Project for Efficient Machine Learning Pipeline Development and Execution
M. Ham, J. Moon, G. Lim, S. Woo, W. Song, J. Jung,H. Ahn, P. Kapoor, D. Chae, G. Jang, Y. Ahn, J. Lee
https://nnstreamer.ai/https://github.com/nnstreamer/nnstreamer
![Page 27: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/27.jpg)
Neural Network Pipeline
Efficient and flexible pipelines for Neural Networks
1000s Lines of Code
Manual Parallelization
Direct media/hardware Optimization
10s Lines of Pipeline Description
Automatic Pipeline Parallelization
Reusable Module for media/hardware
Queue Depth Post-Process
Queue Segmentation
Mux Synthesis App Pre-process
Ex) AR Application Post-Process
![Page 28: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/28.jpg)
Do Not Reinvent the Wheel
GStreamer– https://gstreamer.freedesktop.org
– Open source multimedia pipeline framework
– Library for constructing graphs of media-handling components
nnStreamer– But, perfect the wheel!
– Extension of GStreamer for AI processing
Neural network as another media filter
Neural network data as tensor stream
28/23
![Page 29: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/29.jpg)
Neural Network to GStreamer Pipeline
Code: src ! sinksr
c
sink
Code: src ! tensor_filter mode=tensorflow-lite model=abc.tflite ! sink
src
sinkTensorflow-lite
abc.tflite
Code: src ! tensor_filter mode=tensorflow model=def.pb ! sink
src
sinkTensorflow
def.pb
Code: src ! tensor_filter mode=caffe2 model=ghi.pb ! sink
src
sinkCaffe2
ghi.pb
![Page 30: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/30.jpg)
Data Conversion
Code: src ! tensor_filter mode=tensorflow-lite model=abc.tflite ! sinksr
c
sinkTensorflow-lite
abc.tflite
1920x1080 h264 3x300x300, float32
Code: src ! decodebin ! videoconvert ! videoscale! video/x-raw,format=RGB,width=300,height=300! tensor_convert! tensor_transform mode=typechg option=float32! tensor_filter mode=tensorflow-lite model=abc.tflite ! sink
src
sinkTensorflow-lite
abc.tflite
1920x1080 h264
3x300x300, float32
deco
debin
1920x1080 yuv rawvi
deoco
nve
rt
1920x1080 RGB
videosc
ale
300x300 RGB
capsf
ilter
Ensure 300x300 RGB
tenso
r_co
nve
rter
tenso
r_transf
orm
3x300x300 uint8
3x300x300 float32
![Page 31: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/31.jpg)
Example: Activity Recognition Sensors
• ~1000 lines 16 lines
• @ 30FPS input, 90.4% 51.4% CPU
• ~40 MiB ~17 MiB Memory (RSS)
Neural Net
![Page 32: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/32.jpg)
nnTrainer
nnTrainer: Towards On-Device Learning for Personalization
Submitted to ATC 21https://github.com/nnstreamer/nntrainer
![Page 33: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/33.jpg)
Light-Weight On-Device Training Framework
nnTrainer– Software framework to train neural network on embedded devices
Personalization– As users keep using AI applications, they get
Faster (ex. 100ms to 50ms)
More accurate (ex. 88% to 95%)
Personalized (ex. A Dog to My Dog)
– While providing privacy
Personal data stay at user devices
Challenges Small data for training
Limited compute/memory resources
![Page 34: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/34.jpg)
nnTrainer Overview
Peak Memory Consumption PyTorch : 1.2 GiB TensorFlow : 1.02 GiB NNTrainer : 0.37 GiB
Optimization of memory usage and training time
Transfer learning & Meta-learning
TFLite / Pytorch model-lelve compatibility
Easy to implement custom operators
Supports Android, Tizen, Linux
nnTrainer System Architecture
![Page 35: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/35.jpg)
Few-Shot Learning: SimpleShot
SimpleShot Implementation SimpleShot Inference Results
![Page 36: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/36.jpg)
Example: HandMoji
![Page 37: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/37.jpg)
SNP
Streaming Line Processing Architecture with a WinogradConvolution Array for 4K 60fps Super-Resolution
Applications
Work-in-Progress
![Page 38: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/38.jpg)
Stream Processing for TVs
Non-Streaming Environment Streaming Environment
![Page 39: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/39.jpg)
Super-Resolution Neural Network Scaling
![Page 40: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/40.jpg)
Accelerator Architecture
Architecture Overview Area Breakdown
![Page 41: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/41.jpg)
Implementation & Evaluation
FPGA Demonstration Comparisons of CNN Hardware Accelerators
![Page 42: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/42.jpg)
AI-SW-HW Co-Design
![Page 43: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/43.jpg)
Efficient Neural Network: Voice
13x smaller neural network can show the similar performance in Speech Recognition
91.6% Accuracy @ 530 MB 91.1 % Accuracy @ 38MB
[Reference] Attention based on-device streaming speech recognition with large speech corpus (Interspeech 2019)
![Page 44: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/44.jpg)
Acceleration of Neural Network: Specialized H/W
Voice Recognition NPU: about 4x less power consumption than CPU
ASR Accelerator Architecture
CPUASR Acceleration
(NPU-based)
PowerConsumption 982mW 276mW
Performance Comparison
* Measured under xRT(real-time factor) <1
![Page 45: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/45.jpg)
On-device AI Deployment
![Page 46: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/46.jpg)
Real FLOPS matter!
Big challenge: How to exploit the “peak” FLOPS into “real” FLOPS ???
0
5
10
15
20
25
30
CONV DECONV
Utiliza
tion (
%)
0.2% UtilizationResults from a commercial NPU IP• Conv : 116.5 MACs/clk (30% Utul.)• Deconv: 1.05 MACs/clk (0.2 % Util.)
Exynos 2100 : 26 TOPS (Peak)Snapdragon 888 : 26 TOPS (Peak)
![Page 47: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/47.jpg)
On-device AI will be embedded in various devices
Visual Display Digital AppliancesMobile
Communications
Big challenge: How to provide the “same” AI experience on variety of devices ???
![Page 48: SEJONG - On-Device AI for Mobile and Consumer Devices: From …prism.sejong.ac.kr/dossa-3/dossa_paper/samsung_slide.pdf · 2021. 2. 27. · 1≤ G, H≤𝑁𝑖 1≤𝑖≤𝑁 FleXOR](https://reader035.vdocument.in/reader035/viewer/2022071510/612e145b1ecc515869429662/html5/thumbnails/48.jpg)
Thank you