tinyeod: small deep neural networks and beyond for ... t. kios final.pdf · tinyeod: small deep...
TRANSCRIPT
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
tinyEOD: Small Deep Neural Networks and Beyond for Embedded Vision Applications
Christos Kyrkou and Theocharis Theocharides
KIOS Research and Innovation Center of Excellence and Department of Electrical and Computer Engineering, University of Cyprus
{kyrkou.christos, ttheocharides}@ucy.ac.cy
Summary
• Resource-constraint mobile/embedded devices increasingly
require visual intelligence at high resolution images.
• High-resolution images implies an exponential increase in the
amount of data processed; most of the times, there is a lot of
redundant data.
• We focus on intelligent data reduction techniques.
• We introduce a new way to boost the detection accuracy of
computationally efficient but resolution-limited Convolutional
Neural Networks (CNN) for operating on larger images
• Considerable improvements across multiple applications,
ranging from 2-5x speedup while achieving up to 95%
detection accuracy. Networks remain relatively small
<300KBs, while also exhibiting significant power savings.
Background
Small CNNs
Tiling, Attention, and Memory
Results Conclusions, Ongoing and Future Work
References
Acknowledgements
[1] George Plastiras, Christos Kyrkou, Theocharis Theocharides, “EdgeNet -
Balancing Accuracy and Performance for Edge-based Convolutional Neural
Network Object Detectors”, In Proceedings of the 13th International Conference
on Distributed Smart Cameras (ICDSC 2019). ACM, New York, NY, USA, Article
8, 6 pages
[2] George Plastiras, Christos Kyrkou, and Theocharis Theocharides, “Efficient
ConvNet-based Object Detection for Unmanned Aerial Vehicles by Selective Tile
Processing”, International Conference on Distributed Smart Cameras (ICDSC),
Netherlands, Article 3, 6 pages, 3-4 September 2018.
[3] George Plastiras, Maria Terzi, Christos Kyrkou, Theocharis Theocharides,
"Edge Intelligence: Challenges and Opportunities of Near-Sensor Machine
Learning Applications," 2018 IEEE 29th International Conference on Application-
specific Systems, Architectures and Processors (ASAP), Milano, Italy, pp. 1-7,
2018.
[4] Christos Kyrkou, George Plastiras, Stylianos Venieris, Theocharis
Theocharides, Christos-Savvas Bouganis, "DroNet: Efficient convolutional neural
network detector for real-time UAV applications," 2018 Design, Automation & Test
in Europe Conference & Exhibition (DATE), Dresden, Germany, pp. 967-972,
March 2018.
Why small deep-neural networks?
Small DNNs are more deployable on embedded processors
• Computation and even more so memory are at a premium
• Storing the model on chip saves on power, and improves
performance
Faster to go through training iterations
• more easily updatable over-the-air (OTA)
Small DNNs are more power efficient
• Less off-chip memory accesses which consumes order of
magnitudes more power.
Small DNNs permit for multiple vision tasks to run on the same
platform e.g., object detection
• Deep Learning and Computer Vision are moving to the
Edge
– Drones are a prime example of a resource-constrained
system with additional challenges for detectability at a
distance
• Exploration of neural network architectures is key for
deployment on hardware-constrained devices
– Teacher-student models
– Building the model ground-up: Use evolutionary
methods/reinforcement to build a network with a minimal
overhead
– Apply quantization techniques
– Investigate Cascade structures with hierarchical models.
– Investigate real-time CNN model selection tailored to the
region proposal result.
• Prior knowledge can further push performance
– Apply informed region selection to discard regions using
apriori knowledge.
This work is funded by the European Union’s Horizon 2020
research and innovation programme under grant agreement No
739551 (KIOS CoE) and from the Republic of Cyprus through the
Directorate General for European Programmes, Coordination and
Development.
Figure 2: Single Shot Detection Framework
Single-Shot Detection
• Split the input image in a grid and for each grid generates
bounding boxes and class probabilities.
• Outputs a confidence score that tells us how certain it is that
the predicted bounding box encloses some object
• Predicts B bounding boxes, confidence for those boxes, and
C class probabilities, encoded as an S × S ×(B × 5+C)
tensor
• More suitable for real-time applications
Dealing with computational cost of CNN
• Memory Footprint / Performance
– Reduce the number of layers
– Reduce number of filters
– Reduce filter size
– Increase stride
• Object Size
– Affects the accuracy
– Choose input based on object
• Input Size
– Increase for better accuracy
– Reduce for better performance
Case-Study: DroNet Architecture
• Trained with a custom database for vehicle detection
• Processes 512x512 images
• Make use of 3x3 filters and cheaper 1x1 convolutions
• Progressively reduce the feature maps size by a factor of 2
• Smaller number of filters at early layers
Figure 4: DroNet
Architecture
Figure 3: Parameters affecting CNN
compute performance
Case-Study: UAV vision
• Odroid-XU4 Platform -
Octacore Samsung Exynos-
5422 CPU
• Lightweight and capable of
being powered by the UAV
platform
• Pedestrian and Vehicle
Detection Application
(Manually Collected Dataset)
Figure 6: DJI Matrice 100 UAV and
Computing Platform
Figure 7: Detection Results on Aerial Images of vehicles
An object detection algorithm for UAVs that:
• Discard information and avoid unnecessary computations
• Avoid reducing the image accuracy and distorting the objects
• Make smaller objects detectable
1. Tiling
• Separating the input image into smaller regions capable of being
fed to the CNN in order to avoid resizing the input image and
maintain object resolution
2. Memory Mechanism
• Keep track of detection metrics in each tile over time
• Relative position of objects will not change significantly over a
few successive frames
3. Attention Mechanism
• Select which tiles to be processed by the CNN?
• Select top 𝑁 tiles above a threshold for processing based on
statistical information
Figure 8: Detection Results on Aerial Images of pedestrians using the tiling approach.
Notice that different tils are selected for processing at every frame
Figure 5: Proposed selective tiling approach with attention and memory
Figure 9: Performance Metrics for Different Platforms, Algorithms and Configurations
Performance
• Up to 30 FPS on embedded CPU.
• Comparing DroNet with different models demonstrates the
effectiveness of the architecture.
• Similar accuracy to tinyYolo + much faster.
• 20% accuracy improvement over plain resizing approach
• Less memory requiring only 283 KB.
• Overall, the tiling strategy can be slower than plain resizing
but more efficient than processing the whole image - best
trade-off between accuracy and performance
funded by:
Figure 1: Visual edge intelligence is a growing necessity for emerging applications where real-time
decision is vital.
Efficient Object
Detection