imaging automotive 2015 addfor v002

IMAGING APPLICATIONS - AUTOMOTIVETechnical Presentation

Environmental Conditions• Brief Company Presentation• Synthetic Environment Generation• New Developments in Image Understanding• Training and Testing the Vision Systems• Target Hardware

We develop custom Image Recognition systems for Aerospace and defence applications. Using algorithms like Deep Convolutional Neural Networks and Regional Convolutional Neural Networks.

Our algorithms for Target Recognition and Tracking are designed from the beginning to be run on embedded systems. We target both GPU and FPGA devices.

To Train and Validate our algorithms we developed a process to generate photorealistic 3D environments.

Those 3D Environments are used to produce realistic video streams of the targets in different environmental conditions (lighting, adverse meteorological conditions, camouflage, point-of-view).

The same technology can be used to Train and Test Automotive Vision Systems.

Meeting Agenda

We produce Highly-Realistic Virtual Environment to Train and Test vision algorithms.

Click Here to see a demo video

https://youtu.be/ZOLSEVLB84I













































































































































































COMPANY PRESENTATION

Addfor provides specialized IT services and scientific applications.

Main office: Turin

Full-Time Employees:14

Automotive Partner Companies: 2

Academic Collaboration Agreements: 5

About me:

CTO in Addfor

11 years in Mathworks (Senior Application Engineer 2000-2011):

Image Processing / Video Processing

Academic Programs

10%

25%

30%

35%AutomotiveAerospaceEnergyOther

REVENUES

SYNTHETIC ENVIRONMENT GENERATION

UNREAL ENGINE 4

Environmental Conditions• Shadows• Partial Occlusions (traffic, vegetation)• Adverse meteorological conditions• Road Signs positioning• Different Road Sign shapes in different countries

System Conditions• Vehicle speed• Vibrations• Sensor resolution and color response• Headlights color and beam shape• Dirty / Scratched lenses

We develop custom Image Recognition systems for Aerospace and defence applications. Using algorithms like Deep Convolutional Neural Networks and Regional Convolutional Neural Networks.

Our algorithms for Target Recognition and Tracking are designed from the beginning to be run on embedded systems. We target both GPU and FPGA devices.

To Train and Validate our algorithms we developed a process to generate photorealistic 3D environments.

Those 3D Environments are used to produce realistic video streams of the targets in different environmental conditions (lighting, adverse meteorological conditions, camouflage, point-of-view).

The same technology can be used to Train and Test Automotive Vision Systems.

3D Photorealistic Environmentsfor Automotive Vision Systems
















































































































































































AUTOMATIC TAGGING


https://youtu.be/Zf8rWyLrvPY






















































Optical system Simulation• FOV• Lens Flares• Distorsions

CCD/CMOS Physical Simulation• Sensor Resolution• Photons Flux• Dark Current• Source Follower Noise• AD Conversion• Integral Linearity Error• Quantization Noise

Optics and Sensors Simulation


Stereo Vision SimulationAdaptive Cruise control with Radar SimulationPedestrian ProtectionLane Departure SystemsBlind Spot ProtectionHigh Beam AssistanceCamera positioning Simulations

Other possible testing environments:

ONBOARD CAMERACAMERA POSITIONING SIMULATIONS


https://youtu.be/C9xSRx7WReQ






















































IMAGE UNDERSTANDING

Military Prototypes• Visible and Infrared Wavelengths• FPGA and GPU Targets• Old approach: HOG+SVM / HOUGH TRANFORM• New systems are based on:

• Aggregated Channel Features as Regional Proposal Method• Finetuned AlexNet (CNN) as Main Detector• SVM as classificator

Side Project (just for fun)We are developing a Pedestrian Detection System that exceeds the performances of:JHosang, Omran, Benenson, Schiele. Taking a deeper look at pedestrians. arXiv preprint arXiv:1501.05790, 2015

Some History about modern image detectors:

Viola&Jones DetectorThis detector, proposed in 2001 by Paul Viola and Michael Jones, has been the first object detection framework to provide competitive object detection rates in real-time.

HOG+SVMIntroduced in 2005 by Navneet Dalal and Bill Triggs for the identification of pedestrians in static images.(Used for example in XYLON logiPDET)

ACF / LDCFThe Aggregated Channel Features detector is one of the most famous detectors available at the state of the art. We use it as Regional Proposal Layer.Alternatively we experiment with LDCF (Locally decorrelated Channel Feature Detector).

CNNConvolutional Neural Networks are a subclass of Deep Neural Networks. This is the state of the Art today: we use it as Main Detector.

We develop Advanced Prototypesof State of the Art Image Understanding Systems

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilitiesbetween the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-partsat the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, andthe number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–4096–4096–1000.

neurons in a kernel map). The second convolutional layer takes as input the (response-normalizedand pooled) output of the first convolutional layer and filters it with 256 kernels of size 5⇥ 5⇥ 48.The third, fourth, and fifth convolutional layers are connected to one another without any interveningpooling or normalization layers. The third convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourthconvolutional layer has 384 kernels of size 3 ⇥ 3 ⇥ 192 , and the fifth convolutional layer has 256kernels of size 3⇥ 3⇥ 192. The fully-connected layers have 4096 neurons each.

4 Reducing Overfitting

Our neural network architecture has 60 million parameters. Although the 1000 classes of ILSVRCmake each training example impose 10 bits of constraint on the mapping from image to label, thisturns out to be insufficient to learn so many parameters without considerable overfitting. Below, wedescribe the two primary ways in which we combat overfitting.

4.1 Data Augmentation

The easiest and most common method to reduce overfitting on image data is to artificially enlargethe dataset using label-preserving transformations (e.g., [25, 4, 5]). We employ two distinct formsof data augmentation, both of which allow transformed images to be produced from the originalimages with very little computation, so the transformed images do not need to be stored on disk.In our implementation, the transformed images are generated in Python code on the CPU while theGPU is training on the previous batch of images. So these data augmentation schemes are, in effect,computationally free.

The first form of data augmentation consists of generating image translations and horizontal reflec-tions. We do this by extracting random 224⇥ 224 patches (and their horizontal reflections) from the256⇥256 images and training our network on these extracted patches4. This increases the size of ourtraining set by a factor of 2048, though the resulting training examples are, of course, highly inter-dependent. Without this scheme, our network suffers from substantial overfitting, which would haveforced us to use much smaller networks. At test time, the network makes a prediction by extractingfive 224 ⇥ 224 patches (the four corner patches and the center patch) as well as their horizontalreflections (hence ten patches in all), and averaging the predictions made by the network’s softmaxlayer on the ten patches.

The second form of data augmentation consists of altering the intensities of the RGB channels intraining images. Specifically, we perform PCA on the set of RGB pixel values throughout theImageNet training set. To each training image, we add multiples of the found principal components,

4This is the reason why the input images in Figure 2 are 224⇥ 224⇥ 3-dimensional.

5

The AlexNet Structure

http://www.logicbricks.com/Products/logiPDET.aspx

Wy Deep Learning is a Disruptive Technology

The Caltech Dataset - 10h @ 30Hz video - 250,000 annotated images

Detector Demo

ACFAggregated Channel Features

AlexNetDeep Convolutional NN

SVMSupport Vector Machine


https://youtu.be/QtgIyJlIn44






















































TARGET HARDWARE

Algorithm Development - GPU Target

Target Hardware:• NVIDIA Jetson TK1• NVIDIA Jetson TX1

Technologies:• CUDA• Locally Decorrelated Channel Features (LDCF)• Deep Convolutional Neural Networks (CNN)• Regional Convolutional Neural Networks (RCNN)

Applications:• Target Recognition (military application)• Target Tracking (military application)• Pedestrian detection• Traffic Sign Recognition• Vehicle Recognition and Tracking

We develop custom image processing application using Deep Learning Technologies (DCNN and RCNN).Those methods require big datasets to be trained, The training datasets are provided by the customer. Alternativeli the customer provide the technical specifications of the objects to be recognized and we generate a synthetic dataset with 3D modeling tools like Maya and Unreal Engine. Once the the dataset is available the training of the systems is performed on a GPU cluster.

The final algorithm is validated on an extensive dataset and ported on a format suitable for an embedded GPU processor.When possible we prefer to use NVIDIA target solutions like the Jetson TK1 or the new Jetson TX1.

Algorithm Development - FPGA TargetWe are developing an easy-to-use Integrated Development Environment to easily and rapidly develop and simulate a customized FPGA-based Convolutional Neural Network.

The rationale behind the idea of using an FPGA-based implementation for CNNs is mainly related to power efficiency and cost concerns. As from the literature, the power efficiency achieved by FPGA-based implementations of CNNs can only be enhanced with ASIC solutions, however for low selling volumes (order of millions of units) the FPGA alternative is more effective in terms of TCO, since NRE costs related to ASIC are stated around 2-3M.

If we consider a fixed area and power budget, CPU solutions are not able to meet the required performance, while, on the other hand, the average utilization of GPU-based implementation is about 40%, thus leading to wastage of power and area.

We allow a designer to define a Convolutional Neural Network (CNN) in terms of a sequence of convolutional and fully connected layers, plus the dimensions of the input image which will be classified by the network.

Out of this CNN model we generate multiple targets; at the moment, one aimed at

CPUs and one aimed at FPGAs.

The first target is employed to test the overall network on a given dataset; the latter, instead, is a streaming oriented high performance FPGA-

oriented hardware accelerator, both power efficient and with high throughput.

With respect to state of the art HLS tools we are able to mitigate the memory pressure of CNN loads by automatically moving the computation type from iterative to data-flow. Furthermore we are able to optimally exploit full or partial buffering of data with respect to performance and resource requirements. This allows, also thanks to the adoption of standard hardware interface such as AXI-Stream, to generate a software/hardware system that can be easily integrated in a larger system.

IN DEVELOPMENT…

CNN on FPGA - User Interface:We are working to develop a fully automatic software system to generate CNN directly in FPGA. This system will be able to do the scaling and to allow the user to directly calculate the tradeoff between Logic Gates and FPS.

The designer will define a Convolutional Neural Network (CNN) in terms of a sequence of convolutional and fully connected layers, plus the dimensions of the input image which will be classified by the network, as shown in Figure 1

Parameter selection: Kernel height and width; Number of feature maps both in input and in outputHyperbolic tangent functions in the output layers Max-pooling kernel

CONCLUSIONS

WE ARE AT THE ENDOF THE BEGINNING

(John Kelly SVP - Director of IBM Research)

There is a Global Effort to developCOGNITIVE COMPUTING

IBM (IBM.N) said it will invest more than $1 billion to establish a new business unit for WatsonReuters - Thu Jan 9, 2014 2:50am EST

"The biggest thing will be Artificial Intelligence," Schmidt (Google CEO) said at OasisBloomberg - Mar 6, 2014 10:07 PM GMT+0100

China's top search engine Baidu Inc. has hired Google Inc's former Artificial Intelligence (AI) chief Andrew NgReuters - Fri May 16, 2014 4:58pm EDT

Addfor scientific applications - advantages:

Fast Development Cycle - Agile software development

Technology Assessments - Custom Algorithms + Libraries

Strong relationship with universities BUT SW Agnostics

Advanced (working) Prototypes

Knowledge Transfer

Addfor s.r.l.www.add-for.com

P.zza Solferino 7Torino 10121 (TO) - Italy

imaging automotive 2015 addfor v002

Data & Analytics