ai on the edge - cambridge wireless · cyrus m. vahid, principal solutions architect, principal...
TRANSCRIPT
![Page 1: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/1.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cyrus M. Vahid, Principal Solutions Architect,Principal DeepLearning Solution Architect
Oct 2017
AI On the Edge
![Page 2: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/2.jpg)
Motivation
![Page 3: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/3.jpg)
Training vs. Inference
• Training is performed on the cloud.
• Inference is performed everywhere
• Efficiency of inference is indispensable to address:• Latency• Connectivity• Cost• Privacy/Security
![Page 4: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/4.jpg)
MotivationLarge DNNs require huge amounts of memory--e.g.
Alexnet Caffemodel is over 200MB VGG-16 CaffeModel is over 500MB.
Complex computation makes apps power hungry.
Edge devices have low power and small memory capacity------------------------------------------------------∴
To run models on the edge we need to compress them significantly
![Page 5: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/5.jpg)
Motivating Examples From Customers
• Industrial IoT (Out of Distribution/Anomaly Detection)
![Page 6: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/6.jpg)
Motivating Examples From Customers
• Real Time Filtering (Neural Style Transfer)
![Page 7: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/7.jpg)
Motivating Examples From Customers
• Building a Better Hearing Aid (Recurrent Acoustic Models)
![Page 8: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/8.jpg)
Motivating Examples From Customers
• Security Robots (Object Detection and Recognition)
![Page 9: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/9.jpg)
Autonomous Vehicles
![Page 10: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/10.jpg)
Model Compression
![Page 11: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/11.jpg)
Computational Efficiency• The goal is to reduce floating point operations and
number of parameters:
Fast Fourier Transform
Most effective for larger kernels
2𝑁# → 2𝑁𝑙𝑛(𝑁) Winograd FFT2.25𝑡𝑖𝑚𝑒𝑠𝑟𝑒𝑢𝑐𝑡𝑖𝑜𝑛𝑓𝑜𝑟
𝐹(2×2, 3×3)
Tensor Contraction Layer
𝐼 = 𝑥=,= ⋯ 𝑥?,=⋮ ⋱ ⋮𝑥=,B ⋯ 𝑥?,B ?CB
𝐹 = 𝑓=,= ⋯ 𝑓D,=⋮ ⋱ ⋮𝑓=,E ⋯ 𝑓D,E DCE
𝐹 = 𝐹= ⊗…⊗ 𝐹D
Separable KernelsO 𝑤J → 𝑂 𝑤×𝑑
Very effective on CPU
![Page 12: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/12.jpg)
Model Compression: Pruning-Quantization-Encoding
arXiv:1510.00149v5
![Page 13: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/13.jpg)
Model Compression: Pruning
• Pruning is removing connections that are less effective in computation of a network.
• After training is performed, then all the weights that are smaller than a certain threshold are removed, and model is retrained.
• Reduction of number of parameters by 9-13 times without loss of accuracy is shown. [arXiv:1510.00149v5]
![Page 14: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/14.jpg)
Model Compression: Quantization
• Quantization is about using fewer bits to express the same information.
• Wight sharing a one method of quantization via using centroids as shared weights.
[arXiv:1510.00149v5]:weightsharingthroughscalarquantization
Good to take advantage of low precision hardware acceleration
![Page 15: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/15.jpg)
Model Compression: Hoffman Coding
• A Hoffman code is an optimal prefix code commonly used for lossless data compression.
• It uses variable-length code words to encode source symbols.
• More common symbols are represented with fewer bits.
[arXiv:1510.00149v5]
• probabilitydistributionofquantizedweightsandthesparsematrixindexofthelastfullyconnectedlayerinAlexNet.
• mostofthequantizedweightsaredistributedaroundthetwopeaks;thesparsematrixindexdifferencearerarelyabove20
• ExperimentsshowthatHuffmancodingthesenon-uniformlydistributedvaluessaves20%- 30% ofnetworkstorage.
BMXNet – Collaborators in the MXNetcommunity, brought this to binary weightshttps://github.com/hpi-xnor/BMXNet
![Page 16: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/16.jpg)
Reduced Architecture
SqueezeNet: AlexNet Accuracy with 50x Fewer Parameters
Good for devices with low RAM that can’t hold all weights for larger models concurrently in memory
Student/Teacher training
![Page 17: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/17.jpg)
Comparing Techniques
WinogradConvolutions
SeparableConvolutions
Quantization Tensor Contractions
Sparsity Exploitation
Weight Sharing
CPUAcceleration
+ ++ = ++ + +
GPU Acceleration
+ + + + = +
Model Size = = - - - -
ModelAccuracy
= - - - - -SpecializedHardware Acceleration
+ + ++ + + +
![Page 18: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/18.jpg)
![Page 19: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/19.jpg)
Edge Compute Models – AWS IoT
Key Functions• Data Ingest• Compressed Inference• Full Inference / Trained Model Query• Model Training
Deployment ModelsCloud <-> EdgeCloud <-> Hub <-> Edge
Edge Analytics Trends : Reduce Latency, Reduce Transfer Costs
![Page 20: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/20.jpg)
AWS Deep Learning Infrastructure Tools
P2 Instances:Up to 40K CUDA Cores
Deep Learning AMI,Preconfigured for Deep Learning mxnet, TensorFlow, …
CFM TemplateDeep Learning Cluster
![Page 21: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/21.jpg)
Apache MXNet
Most Open Best On AWSOptimized for deep learning on AWSAccepted into the Apache Incubator
![Page 22: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/22.jpg)
IdealInception v3Resnet
Alexnet
88%Efficiency
1 2 4 8 16 32 64 128 256
Amazon AI: Scaling With MXNet
![Page 23: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/23.jpg)
Manage and Monitor Models on The Fly
AWS
Captured Data
Upload Tagged Data
Escalate toAI Service
Escalate toCustom Model on P2
Deploy andManage Model
![Page 24: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/24.jpg)
Local Learning LoopPoorly
Classified Data
Updated Model
Fine Tune Model With Accurate Classification
![Page 25: AI On the Edge - Cambridge Wireless · Cyrus M. Vahid, Principal Solutions Architect, Principal DeepLearningSolution Architect AWS DeepLearning cyrusmv@amazon.com Oct 2017 AI On the](https://reader033.vdocument.in/reader033/viewer/2022041905/5e62b43103ec89028803702b/html5/thumbnails/25.jpg)
References
• arXiv:1510.00149v5: Deep Compression; Han, Mao, and Dally• arXiv:1509.09308v2: Fast Algorithms for CNN, Laving & Gray• arXiv:1706.00439v1: Tensor Contraction Layers; Anima Anandkumar et al• arXiv:1606.09274v1 : Compression of NMT via Pruning; See, Luong, Manning• http://cs231n.stanford.edu/reports/2016/pdfs/117_Report.pdf: Pruning Winograd and FFT based
algorithms; Liu and Turakhia• https://colfaxresearch.com/falcon-library/• https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/• https://en.wikipedia.org/wiki/Fast_Fourier_transform• https://arxiv.org/pdf/1611.06321.pdf: Learning the Number of Neurons in Deep Networks• https://aclweb.org/anthology/D16-1139: Sequence Level Knowledge Distillation; Kim and Rush