modi: mobile deep inference...─ modi allows for dynamic mobile inference model selection through...
TRANSCRIPT
![Page 1: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/1.jpg)
MODI: Mobile Deep Inference
Samuel S. OgdenTian Guo
Made Efficient by Edge Computing
![Page 2: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/2.jpg)
Sneak Peek
• Aim is to be a dynamic solution to a dynamic problem that has previously been solved statically
![Page 3: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/3.jpg)
Background – Mobile Inference
• Using deep learning models in mobile application─ Increasingly common to use deep learning models within mobile
applications§ Image recognition§ Speech recognition
[email protected] - MODI3
• Two major metrics─ Model Accuracy─ End-to-end latency
• On-device vs. Remote inference
![Page 4: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/4.jpg)
Mobile Deep Inference Limitations
• Highly constrained resource models─ Battery constraints─ Lack of limited hardware in general case
• Highly dynamic environment─ Variable network conditions
• Common approaches are statically applied─ Choosing a one-size-fits-all model for on-device inference─ Using the same remote API for all inference requests
[email protected] - MODI4
![Page 5: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/5.jpg)
Our Vision: MODI
• How do we balance accuracy and latency based on dynamic constraints?
• Provide a wide array of models─ Model-usage families and derived models
• Dynamically choose inference location and model─ Make decision based on inference environment
§ e.g., network, power, model availability
• Make the choice transparent
[email protected] - MODI5
![Page 6: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/6.jpg)
MODI: System design
6 [email protected] - MODI
MODI Models & Stats
MODIOn-device Inference
MODIRemote inference
Mobile Device Edge Servers
Central Manager
Requests Response
Metadata
Met
adat
a
Mod
els
![Page 7: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/7.jpg)
MODI: System design
7 [email protected] - MODI
MODI Models & Stats
MODIOn-device Inference
MODIRemote inference
Mobile Device Edge Servers
Central Manager
Requests Response
Metadata
Met
adat
a
Mod
els
![Page 8: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/8.jpg)
MODI: System design
8 [email protected] - MODI
MODI Models & Stats
MODIOn-device Inference
MODIRemote inference
Mobile Device Edge Servers
Central Manager
Requests Response
Metadata
Met
adat
a
Mod
els
![Page 9: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/9.jpg)
Design Principles
• Maximize usage of on-device resources
• Storage and analysis of metadata
• Dynamic model selection
[email protected] - MODI9
![Page 10: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/10.jpg)
Design Questions
• Which compression techniques are useful?
• Which model versions to store where?
• When to offload to edge servers?
[email protected] - MODI10
![Page 11: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/11.jpg)
Design Questions
• Which compression techniques are useful?
• Which model versions to store where?
• When to offload to edge servers?
[email protected] - MODI11
![Page 12: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/12.jpg)
Results – Model Compression
Storage requirements reduced by 75% for quantized models
12 [email protected] - MODI
InceptionV3 image classification model1 running on a Google Pixel2 device
1https://arxiv.org/abs/1512.00567
![Page 13: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/13.jpg)
Results – Model Compression
Load time reduced by up to 66%─ Leads to ~6% reduction in accuracy
13 [email protected] - MODI
InceptionV3 image classification model1 running on a Google Pixel2 device
1https://arxiv.org/abs/1512.00567
![Page 14: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/14.jpg)
Design Questions
• Which compression techniques are useful?─ Quantization and gzip significantly reduce model size
• Which model versions to store where?
• When to offload to edge servers?
[email protected] - MODI14
![Page 15: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/15.jpg)
Results – Model Comparison across devices
Pixel2 over 2.5x faster than older devices─ Specialized deep-learning hardware
15 [email protected] - MODI
InceptionV3 image classification model optimized for inference
![Page 16: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/16.jpg)
Design Questions
• Which compression techniques are useful?─ Quantization and gzip significantly reduce model size
• Which model versions to store where?─ Mobile devices can reduce runtime up to 2.4x
• When to offload to edge servers?
[email protected] - MODI16
![Page 17: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/17.jpg)
Results – Inference Offloading Feasibility
Network transfer is up to 66.7% of end-to-end time
17 [email protected] - MODI
Used AWS t2.medium instance running InceptionV3
![Page 18: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/18.jpg)
Design Questions
• Which compression techniques are useful?─ Quantization and gzip significantly reduce model size
• Which model versions to store where?─ Mobile devices can reduce runtime up to 2.4x
• When to offload to edge servers? ─ Slower networks would hinder remote inference
[email protected] - MODI18
![Page 19: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/19.jpg)
Conclusions & Questions
• Key points:─ MODI allows for dynamic mobile inference model selection through post-
training model management─ Enables greater flexibility for mobile deep inference
• Controversial:─ Whether using a low-tier AWS instance is similar to edge
• Looking forward:─ Integrating MODI with existing deep learning frameworks─ Explore explicit trade-off points between on-device and remote inference─ Exploring how far in the edge is ideal for remote inference─ What other devices could this be used for?
[email protected] - MODI19
![Page 20: MODI: Mobile Deep Inference...─ MODI allows for dynamic mobile inference model selection through post - training model management ─ Enables greater flexibility for mobile deep](https://reader033.vdocument.in/reader033/viewer/2022041910/5e671a870e433534e2458762/html5/thumbnails/20.jpg)