grokking techtalk #21: deep learning in computer vision

DeepLearninginComputerVisionAxon@GrokkingOct.28,2017

DangHuynhEducation

• Ph.D.inComputerScience(France)

Work• Jan2017– now:AxonEnterprise• 2015– 2016:Misfit• 2011– 2015:NokiaBellLabs

Researchdomains• Machinevision.• Datascience.• Telecommunicationsystems.

Axon Enterprise

Misfit

Nokia Bell Labs2/43

WeareAXON!

Outline

•Refresh•Computervision•DeeplearninginComputervision•Theoryvs.Reality•Demo

RefreshMachinelearningandDeeplearning

MachinelearningInputdataà predictionmodelà outputlabel

y=F(x)x0

MachineLearningy=4x13 - 2x22 +8

f(x)=x3x1

f(x)=x2

weight=1

MachineLearning

Challenges• Relevantdataacquisition• Datapreprocessing• Featureselection• Modelselection:simplicityversuscomplexity• Resultinterpretation.

DeepLearning• MachineLearningwithmany(deep)hiddenlayers

HiddenlayersInput Output9/43

Whydeeplearning?

Amountofdata

rmance

Deeplearning

Machinelearning

ComputerVisionintro

Makecomputersunderstandimagesandvideo:- Detection- Recognition- Tracking- Extraction

ComputerVision

Object detection 12/43

Stilltherearechallenges:objectcanbe…

ComputerVision

… partlyoccluded

… orevenfullyoccluded.

ChallengeWe were building a human detector, and we accidentally got future human detector!

TraditionalapproachDeeplearningapproach

has two eyes?

has a nose below eyes?

Ok, it’s a face!

Feature engineering NO feature engineering

Traditionalapproachvs.Deeplearning

ImageNet: 1.2 million images with 1000 object categories

Source:http://pattern-recognition.weebly.com/

Deep learningTradition

DeepLearning in ComputerVision

ComputerVisionWhatcomputersees

Red43 45 2113 34 12

23 88 55

Green19 89 2717 57 29

75 56 94

Blue19 89 2717 57 29

75 56 94

y=F(Red,Green,Blue)

3-Dinputarray

Facialdetection

Intuition

HiddenlayersInput Output

Facialdetection

ConvolutionalNeuralNetwork(CNN)Idea:havingafilterscanningoverimage.

Outputmatrix

Inputmatrix(e.g.,image)Filter(grey)

Source:https://github.com/vdumoulin/conv_arithmetic

Convolutionalprocess

CNN – StridingandPaddingControlhowthefilterconvolvesaroundtheinputmatrix.

Outputmatrix

Inputmatrix(e.g.,image)

Filter(grey)

Source:https://github.com/vdumoulin/conv_arithmetic

Stride=2,Zero-padding=121/43

Convolutionaloperation

0 1 1 1 0 0 00 0 1 1 1 0 00 0 0 1 1 1 00 0 0 1 1 0 00 0 1 1 0 0 00 1 1 0 0 0 01 1 0 0 0 0 0

1 0 10 1 01 0 1

1 4 3 4 11 2 4 3 31 2 3 4 11 3 3 1 13 3 1 1 0

5x5Output

3 x3Filter

7x7Input

Input [height1,width1,#ofchannels]Filter [height2,width2,#ofchannels]Output [height3,width3,#offilters] 22/43

RectifiedLinearUnit(ReLU)

ReLU:F(y)=max(0,y)

-3 2 01 -1 0

-5 2 4

0 2 01 0 0

Non-linearactivationfunction.

MaxPooling

1 0 2 3

4 6 6 8

3 1 1 0

1 2 2 4

Reducedimensionandavoidoverfitting.

Maxpoolwith2x2filterandstride2

Example

Input24x24x3

11x11x28 4x4 x48 3x3x64

face/non-face

boundingboxregression

Conv:3x3MP:2x2

Conv:3x3MP:3x3

Conv:2x2 Fullyconnected

SupposethatallMaxPooling(MP)layerhasstride2.

Input:24 x24 x3Conv:3 x3 x3MP:2x2(stride2)à Outputdimension(24 – 3 +1)/2=11

Objectscales• Detectobjectofvarioussizes.

Source:https://www.pyimagesearch.com

Tradeoffs?

scansover

Dataaugmentation• Generatemoreartificialdatapointsfrombasedata.

•Applywithcare tootherdatatypes!

Original Little noise Moderate Heavy noise

Complexdataaugmentation

Face rotation28/43

Whydataaugmentation?

WITHOUT augmentation

AXON detection

WITH augmentation

Howtobenchmark?

Facebook detection 30/43

Theoryvs.Reality

DeeplearninginComputerVisionPros:• DLreducestheneedforfeatureengineering.• DLoutperformsclassicalComputerVisionapproaches.

Cons:• DLrequiresahugeamountofdata(>100Ksamples).• DLisextremelycomputationallyexpensivetotrain(weeksonGPUs).• DLmodelstructureisablackbox.

Performancevs.Portability

Theory Reality

Performancevs.Powerconsumption

Theory Reality

Portable battery34/43

SpecialhardwareforDeepLearning

Jetson TX2 (NVDIA) Google TPU Movidius Myriad

• Optimizedforspecificusecase.• Notplug-and-play,needgoodengineerstomakeitwork.

Stillfarfromconsumer…35/43

Privacy

• Thepoliceareourcustomers,sodataprivacyisimportant.• Canwe“extractfeatures”fromtheprivatedata?

Workflowandtoolset

Skinblurring

Facialdetectionwithtracking

Licenseplatedetection

TakeHomemessage

Industryperspective

Alwaysconsiderthefollowing4Ps:• Performance• Powerconsumption• Portability• Price

Deeplearningisnotamagic:tradeoffalwaysexists!

Thankyou

WeareHiring

FullStack,ResearchEngineers,Security.

https://jobs.lever.co/axon

grokking techtalk #21: deep learning in computer vision

Technology

grokking the rest architectural style

grokking: data engineering course

gtd techtalk

grokking monads in scala

grokking the org

grokking the paradigm modifying menus

grokking techtalk #19: software development cycle in the...

grokking regex

kotlin techtalk

grokking the paradigm quickie configurations

grokking the paradigm creating a component

grokking magento: book 1 - basics & request...

grokking the paradigm changing column layouts

grokking techtalk #11 - why data science?

joinme techtalk

grokking hash tables

techtalk on artiﬁcial...

grokking grok: monitorama pdx 2015

techtalk varnish

grokking techtalk #18a: vietnamese sentiment analysis in a...