cvml software development tools
TRANSCRIPT
![Page 1: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/1.jpg)
N. Tsapanos, P. Bassia, P. Giannakeris,
Prof. Ioannis Pitas
Aristotle University of Thessaloniki
www.aiia.csd.auth.gr
Version 2.5
CVML software development
tools
1
![Page 2: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/2.jpg)
Distributed computing (under development)
• ROS
• Libraries
• OpenCV
• BLAS, cuBLAS
• MKL DNN, cuDNN
• DNN Frameworks
• Neon, Tensorflow, Pytortch, Keras, MXNet
• Distributed/cloud computing
• MapReduce programming model
• Apache Spark
• Collaborative SW Development
• GitHub/Bitbucket.
![Page 3: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/3.jpg)
Robotic Operating System (ROS)
ROS is used in developing robotic applications• ROS architecture:
• ROS master, ROS nodes.
• Node communications:
• Topics
• Publisher and Subscriber nodes.
• Functionalities:• Node graph visualization • Logging/plotting
• Checks and diagnostics.
![Page 4: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/4.jpg)
ROS architecture
(Adapted from Willow Garage’s “What is ROS?” Presentation)
Master
Publisher
Publisher
Subscriber
Subscriber
/topic
(DNS-like)
![Page 5: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/5.jpg)
ROS architecture
• A Node is a single ROS-enabled program: – Most communication happens between nodes. – Nodes can run on many different devices.
• One Master per system.
ROS Master • Enables nodes to locate one another • Handles Parameter Server
camera interface
image processing
motion logic
robot planning
robot interface
![Page 6: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/6.jpg)
ROS Communications
msg … msg … msg
Publisher Node
Publishing msg data On channel /topic
Publisher Node
Advertises /topic is available
with type msg
Subscriber Node
Subscribed to /topicwith type msg
ROS MASTER
Subscriber Node
Listening for /topicwith type msg
/topic
Topics are for Streaming Data.
msg … msg … msg
![Page 7: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/7.jpg)
Libraries vs Frameworks
7https://www.programcreek.com/
•When you use a library, you are in charge of the flow of the
application. You are choosing when and where to call the library.
•When you use a framework, the framework is in charge of the flow.
It provides some places for you to plug in your code, but it calls the
code you plugged in as needed.
![Page 8: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/8.jpg)
OpenCVOpen Computer vision library:• Image processing.• Video analysis.
• Deep NN, Machine Learning• Object detection.
• Computer vision• Camera calibration, 3D world modeling.
![Page 9: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/9.jpg)
Basic Linear Algebra
Subprograms (BLAS) Library
Basic Linear Algebra Subprograms (BLAS) is a software
library of high performance Linear Algebra routines:
• BLAS has three routine sets (“levels“).
• They correspond to both the chronological order of
definition and publication, as well as the degree of algorithm
complexity.
9
![Page 10: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/10.jpg)
BLAS
BLAS Level 1:
• It contains 𝑂(𝑛) routines for vector operations:
• Dot products, vector norms for vector dimensionality 𝑛;
• A generalized vector addition of the form:
𝐲 ← 𝑎𝐱 + 𝐲.• Several other operations.
10
![Page 11: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/11.jpg)
BLAS
BLAS Level 2:
• It contains 𝑂(𝑛2) matrix-vector operations for vector
dimensionality 𝑛, matrix dimensionality 𝑛 × 𝑛.
• A generalized matrix to vector multiplication:
𝐲 ← 𝛼𝐀𝐱 + 𝛽𝐲.• System solver for system of linear equation:
𝐓𝐱 = 𝐲,
• 𝐓: triangular matrix.
11
![Page 12: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/12.jpg)
BLAS
BLAS Level 3:
• It contains 𝑂(𝑛3) matrix to matrix operations for 𝑛 ×𝑛 matrices:
• General matrix multiplication of the form:
𝐂 ← a𝐀𝐁 + β𝐂,• 𝐀, 𝐁 can optionally be transposed or Hermitian-conjugated inside
the routine;
• All three matrices 𝐀,𝐁, 𝐂 may be strided.
• Routines for solving:
𝐁 ← a𝐓−1𝐁,
• 𝐓 is a triangular matrix.12
![Page 13: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/13.jpg)
BLAS• BLAS is extremely fast.
• There are several implementations.
• Some of them are hardware specific.
• Most of them are heavily optimized:
• Taking advantage of data locality in memory to reduce CPU cache
misses.
• Loop unrolling (probably best handled by the compiler).
• Using multiple threads to execute on every CPU core.
• Employing the Single Input Multiple Data (SIMD) CPU operation
capabilities (Streaming SIMD Extensions).
13
![Page 14: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/14.jpg)
cuBLAS
• NVIDIA cuBLAS library is a fast GPU-accelerated
implementation of the standard basic linear algebra
subroutines (BLAS).
• Speed up is achieved by deploying compute-intensive
operations to a single GPU or scale up and distribute work
across multi-GPU configurations efficiently.
• Data needs to be copied to the GPU memory prior to
calling cuBLAS functions. Results need to be copied
backwards to the CPU memory.
![Page 15: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/15.jpg)
cuBLAS
• It has support for multiple GPUs, concurrent kernels and streams.
• The optimization includes:
• Software pipelining.
• Use of vector memory operations that keeps data coalesced.
• Hide any kind of memory path latencies, with instruction and thread
level parallelism (ILP, TLP).
• Take advantage of each different GPU memory (shared, constant,
texture).
• Loop unrolling.
• cuBLAS 9.2 is up to 35 times faster than BLAS.
15
![Page 16: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/16.jpg)
KL DNN
• Intel Math Kernel Library – Deep Neural Network is an
open source, performance-enhancing library for
accelerating deep learning frameworks on Intel Architecture.
• Optimizations are abstracted and integrated directly into
PyTorch, MXNet frameworks.
• Makes use of SIMD instructions available on Intel
processors
• Engine choice available (CPU/GPU).
16
![Page 17: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/17.jpg)
cuDNN
• The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-
accelerated library of primitives for deep neural networks.
• Standard routines:
• Forward and backward convolution.
• Pooling.
• Normalization layers.
• Activation layers.
• cuDNN accelerates widely used deep learning frameworks:
• Caffe, Keras, MATLAB, TensorFlow, Torch.
17https://developer.nvidia.com/cudnn
![Page 18: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/18.jpg)
DNN Frameworks
Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs
![Page 19: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/19.jpg)
DNN Frameworks• All 5 frameworks work with cuDNN as backend.
• cuDNN unfortunately is not open source.
• cuDNN supports FFT and Winograd convolutions.
Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs
![Page 20: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/20.jpg)
DNN Frameworks
Neon
• Intel Neon is a modern deep learning framework created by
Nervana Systems.
• Implemented in Python, while Nervana Caffe framework is
written in C and C++.
• Impressively fast compared to other frameworks.
• Image processing oriented (not general purpose enough).
20
![Page 21: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/21.jpg)
DNN FrameworksNeon:
• Written in Python and C;
• It does not support Windows;
• Uses MKL for CPU (highly optimized by Intel);
• Supports CUDA for GPU;
• Known mostly to be the first to implement Winograd
convolutions faster than others;
• The library is open source and it is an good starting point for
those interested to study the code.
![Page 22: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/22.jpg)
DNN FrameworksTensorFlow (TF)
• For Python (but also APIs in C++, C#, JavaScript, Java).
• It operates with static computation graphs.
• Recommended for production (cloud, mobile).
• Supported by Google.
• Scalable.
Pytorch
• Main competitor of TF.
• Dynamically updated graph (advantage over TF).
• Suited for research, small projects and prototyping.
• Supports distributed learning.22
![Page 23: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/23.jpg)
DNN FrameworksKeras
• High-level API for TF.
• Suitable for quick testing, prototyping.
• Very easy to use, good for beginners.
• Many plug-and-play architectures available.
MXNet
• Highly scalable.
• Very effective parallelization on multiple GPUs and machines.
• Supports many languages (C ++, Python, R, Julia, JavaScript,
Scala, Go, Perl).
• Supported by Apache.23
![Page 24: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/24.jpg)
DNN Frameworks
Recommended for beginners:
Recommended for research:
Recommended for production
(cloud or mobile):
24
![Page 25: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/25.jpg)
Distributed/cloud
computing
• An alternative to expensive Graphics cards and
supercomputers.
• It can consist of any combination of PCs of
various processing power.
• Extremely flexible and scalable.
• Designed to handle Big Data.
![Page 26: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/26.jpg)
MapReduce programming
model
▪ This programming model was developed to implement
the Map and Reduce commands, which can, in turn, be
used to write distributed programs with ease.
▪ The Map command applies a function on every
element of the distributed dataset. This can be used to
filter, sort, or transform the data.
▪ The Reduce command collects the distributed
dataset, or the results of a previous Map or Reduce
command, in a summary operation, such as addition.
![Page 27: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/27.jpg)
Apache Spark• Open software for cluster
computing.
• Developed by Apache.
• Distributed big data computing.
• It is based on MapReduce.
• Up to 100 times faster than
Hadoop MapReduce.
• Machine Learning, Graph
Processing, SQL, Streaming
modules.
![Page 28: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/28.jpg)
Apache Spark
▪ The Apache Spark framework is an implementation of
the MapReduce programming model.
▪ It provides the Map and Reduce command, among
others, in a fault-tolerant environment.
▪ Its advantages over other alternatives, like Hadoop,
include the ability to cache data in memory, in order to
minimize disk access.
▪ The data are stored in data collections named Resilient
Distributed Datasets (RDDs).
![Page 29: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/29.jpg)
Apache Spark
• Supported languages include Java, Scala and Python.
• Spark also includes specialized APIs, like GraphX for
graph processing and SparkML for Machine Learning.
• It is possible to use native libraries to process data
(avoid OpenJDK).
![Page 30: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/30.jpg)
Apache Spark
▪ In Spark, a master node coordinates several worker
nodes.
▪ We used VirtualBox VMs running Ubuntu to set up our
computing cluster.
![Page 31: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/31.jpg)
Apache Spark Mllib
• Machine Learning Library:
• Clustering
• Classification
• Regression
• Data modeling
• Dimensionality reduction
• Available in Pyspark (Spark
wrapper for Python).
other partners offering
![Page 32: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/32.jpg)
Collaborative SW Development
32
• Modern on-line collaborative development platforms provide a common
project repository hosted on-cloud.
• Several features are typically available:
• Distributed version control.
• Source code management.
• Access control.
• Bug tracking.
• Task management.
• Web interface.
![Page 33: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/33.jpg)
Collaborative SW Development
33
• Several alternatives exist.
• Most popular front-end services:
• GitHub.
• Bitbucket.
• Most popular underlying open-source version control systems:
• Git.
• Mercurial.
![Page 34: CVML software development tools](https://reader033.vdocument.in/reader033/viewer/2022042307/625b5896c7305f5003430aaf/html5/thumbnails/34.jpg)
34
References[BRA2008] Bradski G, Kaehler A. Learning OpenCV: Computer vision with the
OpenCV library. " O'Reilly Media, Inc."; 2008 Sep 24.
[KIR2016] Kirk DB, Wen-Mei WH. Programming massively parallel processors: a
hands-on approach. Morgan kaufmann; 2016.
[ABA2016] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M,
Ghemawat S, Irving G, Isard M, Kudlur M. Tensorflow: A system for large-scale
machine learning. In12th {USENIX} Symposium on Operating Systems Design and
Implementation ({OSDI} 16) 2016 (pp. 265-283).
[DEA2010] Dean J, Ghemawat S. MapReduce: a flexible data processing tool.
Communications of the ACM. 2010 Jan 1;53(1):72-7.
[ZAH2026] Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X,
Rosen J, Venkataraman S, Franklin MJ, Ghodsi A. Apache spark: a unified engine
for big data processing. Communications of the ACM. 2016 Oct 28;59(11):56-65.
[KET2017] Ketkar N. Introduction to pytorch. InDeep learning with python 2017 (pp.
195-208). Apress.