tinyeod: small deep neural networks and beyond for ... t. kios final.pdf · tinyeod: small deep...

1
TEMPLATE DESIGN © 2008 www.PosterPresentations.com tinyEOD: Small Deep Neural Networks and Beyond for Embedded Vision Applications Christos Kyrkou and Theocharis Theocharides KIOS Research and Innovation Center of Excellence and Department of Electrical and Computer Engineering, University of Cyprus {kyrkou.christos, ttheocharides}@ucy.ac.cy Summary Resource-constraint mobile/embedded devices increasingly require visual intelligence at high resolution images. High-resolution images implies an exponential increase in the amount of data processed; most of the times, there is a lot of redundant data. We focus on intelligent data reduction techniques. We introduce a new way to boost the detection accuracy of computationally efficient but resolution-limited Convolutional Neural Networks (CNN) for operating on larger images Considerable improvements across multiple applications, ranging from 2-5x speedup while achieving up to 95% detection accuracy. Networks remain relatively small <300KBs, while also exhibiting significant power savings. Background Small CNNs Tiling, Attention, and Memory Results Conclusions, Ongoing and Future Work References Acknowledgements [1] George Plastiras, Christos Kyrkou, Theocharis Theocharides, “EdgeNet - Balancing Accuracy and Performance for Edge-based Convolutional Neural Network Object Detectors”, In Proceedings of the 13th International Conference on Distributed Smart Cameras (ICDSC 2019). ACM, New York, NY, USA, Article 8, 6 pages [2] George Plastiras, Christos Kyrkou, and Theocharis Theocharides, “Efficient ConvNet-based Object Detection for Unmanned Aerial Vehicles by Selective Tile Processing”, International Conference on Distributed Smart Cameras (ICDSC), Netherlands, Article 3, 6 pages, 3-4 September 2018. [3] George Plastiras, Maria Terzi, Christos Kyrkou, Theocharis Theocharides, "Edge Intelligence: Challenges and Opportunities of Near-Sensor Machine Learning Applications," 2018 IEEE 29th International Conference on Application- specific Systems, Architectures and Processors (ASAP), Milano, Italy, pp. 1-7, 2018. [4] Christos Kyrkou, George Plastiras, Stylianos Venieris, Theocharis Theocharides, Christos-Savvas Bouganis, "DroNet: Efficient convolutional neural network detector for real-time UAV applications," 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, pp. 967-972, March 2018. Why small deep-neural networks? Small DNNs are more deployable on embedded processors Computation and even more so memory are at a premium Storing the model on chip saves on power, and improves performance Faster to go through training iterations more easily updatable over-the-air (OTA) Small DNNs are more power efficient Less off-chip memory accesses which consumes order of magnitudes more power. Small DNNs permit for multiple vision tasks to run on the same platform e.g., object detection Deep Learning and Computer Vision are moving to the Edge – Drones are a prime example of a resource-constrained system with additional challenges for detectability at a distance Exploration of neural network architectures is key for deployment on hardware-constrained devices – Teacher-student models – Building the model ground-up: Use evolutionary methods/reinforcement to build a network with a minimal overhead – Apply quantization techniques – Investigate Cascade structures with hierarchical models. – Investigate real-time CNN model selection tailored to the region proposal result. Prior knowledge can further push performance – Apply informed region selection to discard regions using apriori knowledge. This work is funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 739551 (KIOS CoE) and from the Republic of Cyprus through the Directorate General for European Programmes, Coordination and Development. Figure 2: Single Shot Detection Framework Single-Shot Detection Split the input image in a grid and for each grid generates bounding boxes and class probabilities. Outputs a confidence score that tells us how certain it is that the predicted bounding box encloses some object Predicts B bounding boxes, confidence for those boxes, and C class probabilities, encoded as an S × S ×(B × 5+C) tensor More suitable for real-time applications Dealing with computational cost of CNN Memory Footprint / Performance Reduce the number of layers Reduce number of filters Reduce filter size Increase stride Object Size Affects the accuracy Choose input based on object Input Size Increase for better accuracy Reduce for better performance Case-Study: DroNet Architecture Trained with a custom database for vehicle detection Processes 512x512 images Make use of 3x3 filters and cheaper 1x1 convolutions Progressively reduce the feature maps size by a factor of 2 Smaller number of filters at early layers Figure 4: DroNet Architecture Figure 3: Parameters affecting CNN compute performance Case-Study: UAV vision Odroid-XU4 Platform - Octacore Samsung Exynos- 5422 CPU Lightweight and capable of being powered by the UAV platform Pedestrian and Vehicle Detection Application (Manually Collected Dataset ) Figure 6: DJI Matrice 100 UAV and Computing Platform Figure 7: Detection Results on Aerial Images of vehicles An object detection algorithm for UAVs that: Discard information and avoid unnecessary computations Avoid reducing the image accuracy and distorting the objects Make smaller objects detectable 1. Tiling Separating the input image into smaller regions capable of being fed to the CNN in order to avoid resizing the input image and maintain object resolution 2. Memory Mechanism Keep track of detection metrics in each tile over time Relative position of objects will not change significantly over a few successive frames 3. Attention Mechanism Select which tiles to be processed by the CNN? Select top tiles above a threshold for processing based on statistical information Figure 8: Detection Results on Aerial Images of pedestrians using the tiling approach. Notice that different tils are selected for processing at every frame Figure 5: Proposed selective tiling approach with attention and memory Figure 9: Performance Metrics for Different Platforms, Algorithms and Configurations Performance Up to 30 FPS on embedded CPU. Comparing DroNet with different models demonstrates the effectiveness of the architecture. Similar accuracy to tinyYolo + much faster. 20% accuracy improvement over plain resizing approach Less memory requiring only 283 KB. Overall, the tiling strategy can be slower than plain resizing but more efficient than processing the whole image - best trade-off between accuracy and performance funded by: Figure 1: Visual edge intelligence is a growing necessity for emerging applications where real-time decision is vital. Efficient Object Detection

Upload: others

Post on 24-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: tinyEOD: Small Deep Neural Networks and Beyond for ... T. KIOS FINAL.pdf · tinyEOD: Small Deep Neural Networks and Beyond for Embedded Vision Applications Christos Kyrkou and Theocharis

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

tinyEOD: Small Deep Neural Networks and Beyond for Embedded Vision Applications

Christos Kyrkou and Theocharis Theocharides

KIOS Research and Innovation Center of Excellence and Department of Electrical and Computer Engineering, University of Cyprus

{kyrkou.christos, ttheocharides}@ucy.ac.cy

Summary

• Resource-constraint mobile/embedded devices increasingly

require visual intelligence at high resolution images.

• High-resolution images implies an exponential increase in the

amount of data processed; most of the times, there is a lot of

redundant data.

• We focus on intelligent data reduction techniques.

• We introduce a new way to boost the detection accuracy of

computationally efficient but resolution-limited Convolutional

Neural Networks (CNN) for operating on larger images

• Considerable improvements across multiple applications,

ranging from 2-5x speedup while achieving up to 95%

detection accuracy. Networks remain relatively small

<300KBs, while also exhibiting significant power savings.

Background

Small CNNs

Tiling, Attention, and Memory

Results Conclusions, Ongoing and Future Work

References

Acknowledgements

[1] George Plastiras, Christos Kyrkou, Theocharis Theocharides, “EdgeNet -

Balancing Accuracy and Performance for Edge-based Convolutional Neural

Network Object Detectors”, In Proceedings of the 13th International Conference

on Distributed Smart Cameras (ICDSC 2019). ACM, New York, NY, USA, Article

8, 6 pages

[2] George Plastiras, Christos Kyrkou, and Theocharis Theocharides, “Efficient

ConvNet-based Object Detection for Unmanned Aerial Vehicles by Selective Tile

Processing”, International Conference on Distributed Smart Cameras (ICDSC),

Netherlands, Article 3, 6 pages, 3-4 September 2018.

[3] George Plastiras, Maria Terzi, Christos Kyrkou, Theocharis Theocharides,

"Edge Intelligence: Challenges and Opportunities of Near-Sensor Machine

Learning Applications," 2018 IEEE 29th International Conference on Application-

specific Systems, Architectures and Processors (ASAP), Milano, Italy, pp. 1-7,

2018.

[4] Christos Kyrkou, George Plastiras, Stylianos Venieris, Theocharis

Theocharides, Christos-Savvas Bouganis, "DroNet: Efficient convolutional neural

network detector for real-time UAV applications," 2018 Design, Automation & Test

in Europe Conference & Exhibition (DATE), Dresden, Germany, pp. 967-972,

March 2018.

Why small deep-neural networks?

Small DNNs are more deployable on embedded processors

• Computation and even more so memory are at a premium

• Storing the model on chip saves on power, and improves

performance

Faster to go through training iterations

• more easily updatable over-the-air (OTA)

Small DNNs are more power efficient

• Less off-chip memory accesses which consumes order of

magnitudes more power.

Small DNNs permit for multiple vision tasks to run on the same

platform e.g., object detection

• Deep Learning and Computer Vision are moving to the

Edge

– Drones are a prime example of a resource-constrained

system with additional challenges for detectability at a

distance

• Exploration of neural network architectures is key for

deployment on hardware-constrained devices

– Teacher-student models

– Building the model ground-up: Use evolutionary

methods/reinforcement to build a network with a minimal

overhead

– Apply quantization techniques

– Investigate Cascade structures with hierarchical models.

– Investigate real-time CNN model selection tailored to the

region proposal result.

• Prior knowledge can further push performance

– Apply informed region selection to discard regions using

apriori knowledge.

This work is funded by the European Union’s Horizon 2020

research and innovation programme under grant agreement No

739551 (KIOS CoE) and from the Republic of Cyprus through the

Directorate General for European Programmes, Coordination and

Development.

Figure 2: Single Shot Detection Framework

Single-Shot Detection

• Split the input image in a grid and for each grid generates

bounding boxes and class probabilities.

• Outputs a confidence score that tells us how certain it is that

the predicted bounding box encloses some object

• Predicts B bounding boxes, confidence for those boxes, and

C class probabilities, encoded as an S × S ×(B × 5+C)

tensor

• More suitable for real-time applications

Dealing with computational cost of CNN

• Memory Footprint / Performance

– Reduce the number of layers

– Reduce number of filters

– Reduce filter size

– Increase stride

• Object Size

– Affects the accuracy

– Choose input based on object

• Input Size

– Increase for better accuracy

– Reduce for better performance

Case-Study: DroNet Architecture

• Trained with a custom database for vehicle detection

• Processes 512x512 images

• Make use of 3x3 filters and cheaper 1x1 convolutions

• Progressively reduce the feature maps size by a factor of 2

• Smaller number of filters at early layers

Figure 4: DroNet

Architecture

Figure 3: Parameters affecting CNN

compute performance

Case-Study: UAV vision

• Odroid-XU4 Platform -

Octacore Samsung Exynos-

5422 CPU

• Lightweight and capable of

being powered by the UAV

platform

• Pedestrian and Vehicle

Detection Application

(Manually Collected Dataset)

Figure 6: DJI Matrice 100 UAV and

Computing Platform

Figure 7: Detection Results on Aerial Images of vehicles

An object detection algorithm for UAVs that:

• Discard information and avoid unnecessary computations

• Avoid reducing the image accuracy and distorting the objects

• Make smaller objects detectable

1. Tiling

• Separating the input image into smaller regions capable of being

fed to the CNN in order to avoid resizing the input image and

maintain object resolution

2. Memory Mechanism

• Keep track of detection metrics in each tile over time

• Relative position of objects will not change significantly over a

few successive frames

3. Attention Mechanism

• Select which tiles to be processed by the CNN?

• Select top 𝑁 tiles above a threshold for processing based on

statistical information

Figure 8: Detection Results on Aerial Images of pedestrians using the tiling approach.

Notice that different tils are selected for processing at every frame

Figure 5: Proposed selective tiling approach with attention and memory

Figure 9: Performance Metrics for Different Platforms, Algorithms and Configurations

Performance

• Up to 30 FPS on embedded CPU.

• Comparing DroNet with different models demonstrates the

effectiveness of the architecture.

• Similar accuracy to tinyYolo + much faster.

• 20% accuracy improvement over plain resizing approach

• Less memory requiring only 283 KB.

• Overall, the tiling strategy can be slower than plain resizing

but more efficient than processing the whole image - best

trade-off between accuracy and performance

funded by:

Figure 1: Visual edge intelligence is a growing necessity for emerging applications where real-time

decision is vital.

Efficient Object

Detection