lecture 10: other applications of...

Lecture10:OtherApplicationsofCNNs

Bohyung [email protected]

CSED703R:DeepLearningforVisualRecognition(2017F)

ApplicationsofConvolutionalNeuralNetworks

• Facerecognitionandverification

• Personreidentification

• Textregiondetection

• Styletransfer

• Objectgeneration

• Visualattentionandsaliency

• Visualanalogy

• Autonomousdriving

• Manyothers

2

NeuralArtisticStyle

• Maingoal

§ Synthesizingtwoimagesrepresentingbothcontentandstyle

§ Exploitingapretrained CNNforimageclassification

• CNN

§ VGG19layernetwithoutfullyconnectedlayers

§ Nofine-tuning

§ Averagepooling:improvesgradientflowandgetmoreappealingresults

3

[Gatys16]L.A.Gatys,A.S.Ecker,M.Bethge:ImageStyleTransferUsingConvolutionalNeuralNetworks.CVPR2016

Method

4

CNNInput:contentInput:style

Output

Lossinfeaturemap

Lossinfeaturemapcorrelation

+Finaloutput


Optimization

• Loss

§ ! and"#:originalcontentimageanditsfeaturemapinthe$-th layer

§ % and&#:generatedimageanditsfeaturemapinthe$-th layer

§ ' and(#:originalstyleimageanditsfeaturemapinthe$-th layer

§ &#, "#, (# ∈ ℝ,-×/-,where0# isthenumberoffeaturemapsand1# isthe

sizeoffeaturemap 1# = width#×height#§ :;<

# = ∑ &;># &<>

#?> ∈ ℝ,-×,- :correlationoffeaturemapsinthe$-th layer

§ @;<# = ∑ (;>

# (<>#?

> ∈ ℝ,-×,- :correlationoffeaturemapsinthe$-th layer

5

ABCBDE !, ', % = FAGCHBIHB !, %, $ + KALBMEI ', %

AGCHBIHB !, %, $ =12P &;<

# − ";<# R

?

;,<

ALBMEI ', % =12P

S#40#

R1#RP :;<

# − @;<# R

?

;,<

U

#


Optimization

• Errorback-propagation

§ Content:selectaparticularlayersuchasconv4_2

§ Style:useconv*_1withequalweights(S# = 0.2)

6

AGCHBIHB !, %, $ =12P &;<

# − ";<# R

?

;,<

ALBMEI ', % =12PS#X#

U

#

X# =1

40#R1#

RP :;<# − @;<

# R?

;,<

YAGCHBIHBY&;<

# = Z&;<# − ";<

# if&;<# ≥ 0

0 if&;<# < 0

YALBMEIY&;<

# =YALBMEIYX#

YX#Y&;<

# = _12S#&<;

# :;<# − @;<

# if&;<# ≥ 0

0 if&;<# < 0

Updaterule:

Updaterule:


GeneratedImages

7

Style1:TheStarryNight

Style2:TheScream

Source


MoreExamples

8


BalancebetweenContentandStyle

9


MultipleStyles

10

Source

Style1:TheStarryNight

Style2:TheScream


FaceVerification

• Definition

§ Giventwofaces,determinewhether

they aresamepersonornot.

§ Binarydecisionbyone-to-onematching

• Relatedproblem

§ Facedetection:findingfaces

§ Facerecognition:multi-classclassification

problem

• Standardpipeline

11

FaceDetection

FaceAlignment

FeatureExtraction

BinaryClassification

SiameseNetwork

• DeepDiscriminativeMetricLearning(DDML)

§ Learningadistancemetric

§ Representationlearning

§ Twobranchesshareweights.

§ Objective

12

[Hu2014]J.Hu,J.Lu,Y.-P.Tan:DiscriminativeDeepMetricLearningforFaceVerificationintheWild.

CVPR2014

`aR %;, %< = b %; − b %< R

R

$;< c − `aR %;, %< > 1

$;< = 1:sameID

$;< = −1:differentID

c > 1

DeepID I

• DeephiddenIDentity features(DeepID)

13

[Sun2014a]Y.Sun,X.Wang,X.Tang:DeepLearningFaceRepresentationfromPredicting10,000Classes.CVPR2014

97.45%verificationaccuracy:asgoodashumanperformance(97.53%)

CNNArchitecture

• CNNarchitecture

14

e< = max 0,Pi;jS;,<

j + i;RS;,<

R

?

;

+ k<

Multiplescales

VerificationAlgorithm

• JointBayesian

§ % = l + m,wherel isfaceidentityandn isintra-classvariation

§ l~0 0, pl andm~0 0, pm

§ Computeq %r, %s = logv %r, %s wxv %r, %s wy

,whichhasaclosed-formsolution.

• Neuralnetworks

15

highly-correlatedsubfeature (640D)

60groups

• JointBayesianisbetterthanneuralnetwork.

ComparisonbetweenTwoVerifiers

16

Numberofclassesfortraining

Testaccuracy(%)

Results

17

r:restrictedtrainingprotocol,where6000facepairsgivenbyLFWareusedfor10-foldcross-validation

u:unrestrictedtrainingprotocol,wheremoretrainingpairscanbegeneratedfromLFWusingidentity

o:usingoutsidetrainingdata,however,withoutusingtrainingdatafromLFW

o+r:usingbothoutsidedataandLFWdataintherestrictedprotocolfortraining

o+u:usingbothoutsidedataandLFWdataintheunrestrictedprotocolfortraining

TL:JointBayesiantransferlearningfromCelebFaces+toLFW

Comparisonofstate-of-the-artfaceverificationmethodsonLFWMethod Accuracy (%) No. of points No. of images Feature dimensionJoint Bayesian [8] 92.42 (o) 5 99,773 2000 ⇥ 4ConvNet-RBM [31] 92.52 (o) 3 87,628 N/ACMD+SLBP [17] 92.58 (u) 3 N/A 2302Fisher vector faces [29] 93.03 (u) 9 N/A 128 ⇥ 2Tom-vs-Pete classifiers [2] 93.30 (o+r) 95 20,639 5000High-dim LBP [9] 95.17 (o) 27 99,773 2000TL Joint Bayesian [6] 96.33 (o+u) 27 99,773 2000DeepFace [32] 97.25 (o+u) 6 + 67 4,400,000 + 3,000,000 4096 ⇥ 4DeepID on CelebFaces 96.05 (o) 5 87,628 150DeepID on CelebFaces+ 97.20 (o) 5 202,599 150DeepID on CelebFaces+ & TL 97.45 (o+u) 5 202,599 150

No. of outside images

Human-levelperformance:97.53

DeepID II

• Jointidentification-verification

§ Faceidentification:increasestheinter-personalvariationsbydrawing

DeepID2featuresextractedfromdifferentidentitiesapart

§ Faceverification:reducestheintra-personalvariationsbypullingDeepID2

featuresextractedfromthesameidentitytogether

• Featureextraction

18

[Sun2014b]Y.Sun,Y.Chen,X.Wang,X.Tang:DeepLearningFaceRepresentationbyJointIdentification-Verification.NIPS2014

b = Conv i; ~�

TrainingCNN

• Twolossfunctions

§ Identificationloss:cross-entropy

§ Verificationloss

19

b = Conv i; ~�

Ident b, Å; ~;Ç = −PÉ; log É̂;

Ö

;Üj

= − log É̂á

Verif b;, b<, e;<; ~äã =

12b; − b< R

Rife;< = 1

12max 0,å − b; − b< R

Rife;< = −1

~;Ç:parametersofsoftmax

layer

~äã = å

VerificationAlgorithm

• Featureextraction

§ Detect21faciallandmarksbySDMalgorithmandalignfacesglobally

§ Crop400facepatcheswithvariationsinpositions,scales,colorchannels,

andhorizontalflipping

• ConvNet

§ 200CNNs:generate400DeepID2featurevectorswithhorizontalflipping

§ Featurevector:160D

• Featuredimensionalityreduction

§ Select25patchesinagreedymanner

§ PCAfrom25x160Dto180D

20Selected25facepatches

JointBayesianforverification

ResultsonLFW

21

method accuracy (%)

High-dim LBP [4] 95.17± 1.13

TL Joint Bayesian [2] 96.33± 1.08

DeepFace [21] 97.35± 0.25

DeepID [20] 97.45± 0.26

GaussianFace [13] 98.52± 0.66

DeepID2 99.15± 0.13

Human-levelperformance:97.53

FaceNet

• Architecture

§ Directmappingbetweenfaceimagesandembeddedpoints

• Tripletloss

§ Usinglargemarginnearestneighbor(LMNN)

22

[Schroff15]F.Schroff,D.Kalenichenko,J.Philbin:FaceNet:AUnifiedEmbeddingforFaceRecognitionandClustering.CVPR2015

Discriminativevs.GenerativeCNN

• DiscriminativeCNN

• GenerativeCNN

23

ObjectclassViewpointStyle…

CNN

ObjectclassViewpointStyle…

Goal

• Generateanobjectbasedonhigh-levelinputssuchas

§ Class

§ Orientationwithrespecttocamera

§ Additionalparameters

• Rotation,translation,zoom

• Stretchinghorizontallyorvertically

• Hue,saturation,brightness

• Knowledgetransfer

§ GenerativeCNNlearnsthemanifoldofchairs.

§ Interpolationbetweenviewpointsanddifferentobjects

24

[Dosovitskiy15]A.Dosovitskiy,J.T.Springenberg,T.Brox: LearningtoGenerateChairswithConvolutionalNeuralNetworks.CVPR2015

Data

• Using3Dchairmodeldataset[Aubry14]

§ Originaldataset:1393chairmodels,62viewpoints,31azimuthangles,

2elevationangles

§ Sanitizedversion:809models,tightcropping,resizingto128x128

• Notations

§ ç = éj, èj, ~j , éR, èR, ~R , … , é,, è,, ~,

• é:classlabel

• è:viewpoint

• ~:additionalparameters

§ ë = %j, íj , %R, íR , … , %,, í,

• %:targetRGBoutputimage

• í:segmentationmask

25

[Aubry14]M.Aubry,D.Maturana,A.Efros,andJ.Sivic,Seeing3DChairs:ExemplarPart-based2D-3DAlignmentusingaLargeDatasetofCADModels.CVPR2014

NetworkArchitecture

26

ℎ î

ï = î ∘ ℎ

32Mparametersaltogether

Operations

• Unpooling:2x2

• Deconvolution:5x5

• ReLU

27

Fixedlocationunpooling

Training

• Objectivefunction

§ MinimizingtheEuclideanerrorin2Dof

• Reconstructionofthesegmented-outchairimage

• Segmentationmask, í

• Visualizationofuconv-3layerfiltersin128x128network

28

minóPò îôöõ ℎ é;, è;, ~; − úùû %

; ⋅ í;R

R+ îLI† ℎ é;, è;, ~; − úùû í

;R

R,

;Üj

[Saxe14]A.M.Saxe,J.L.McClelland,andS.Ganguli,LearningaNonlinearEmbeddingbyPreservingClassNeighbourhood.ICLR2014

RGBstream

Segmentationstream

NetworkCapacity

29

Translation

Rotation

Zoom

Stretch

Saturation

Brightness

Color

MorphingDifferentChairs

30

Viewpointsintrainingset

AutonomousDriving

• Twopreviousapproaches

§ Mediatedperception:parsingtheentirescenetomakeadrivingdecision

(e.g.,Mobileye,Google)

§ Behaviorreflex:directlymappinganinputimagetoadrivingdecisionby

anregressor (ALVINN,LeCun etal.)

31

Input Image Driving Control

Direct Perception (ours)

Mediated Perception

Behavior Reflex

[Chen15]C.Chen,A.Seff,A.Kornhauser,J.Xiao:DeepDriving:LearningAffordanceforDirectPerceptioninAutonomousDriving.ICCV2015

DeepDriving

• Directperception

§ Estimatingtheaffordancefordriving

§ Simpleinputtomodelusingafewkeyperceptionindicators

§ Compactyetcompletedescriptionsofthesceneforvehiclecontrol

• Approach

§ Builtupondeepconvolutionalneuralnetwork

§ TrainedandtestedonTORCS(TheOpenRacingCarSimulator)

§ Learningforestimatingaffordancerelatedtoautonomousdriving

§ Simplerthanthemediatedperceptionapproach

§ Moreinterpretablethanthetypicalbehaviorreflex approach

32

Platform

• Systemarchitecture

• Environment

§ Focusingonhighwaydrivingwithmultiplelanes

§ Threeconfigurations:aroadofonelane,twolanes,orthreelanes

33(a) one-lane (b) two-lane, left (c) two-lane, right (d) three-lane (e) inner lane mark. (f) boundary lane mark.

TORCS CNN

Image & Speed

Driving Controls

Shared Memory

Write

Read

Driving Controller

Image

Speed

angle

toMarking

dist...

...

Read

Read

Write Controller Output

ConvolutionalNeuralNetwork

• Predictionofaffordanceindicator

34

angle

(a) angle

toMarking_LLtoMarking_LL toMarking_RRtoMarking_RR

toMarking_MLtoMarking_MLtoMarking_MRtoMarking_MR

(b) in lane: toMarking

dist_MMdist_LL dist_RR

(c) in lane: dist

toMarking_LtoMarking_L

toMarking_RtoMarking_RtoMarking_MtoMarking_M

(d) on mark.: toMarking

dist_Rdist_L

(e) on marking: dist

on marking system activate range

on marking system activate range

in lane system activate rangein lane system activate range

overlapping area

overlapping area

(f) overlapping area

CNN

angle

toMarking_LL

dist_LL

toMarking_L

dist_L

……

……

always

inlanesystem

onmarkingsystem

VisualizationofLearnedModels

35

ResponsemapofKITTI-basedConvNet model

ResponsemapofTORCS-basedConvNet model

DeepDrivingDemo

36

C.Chen,A.Seff,A.Kornhauser,J.Xiao:DeepDriving:LearningAffordanceforDirectPerceptioninAutonomousDriving.ICCV2015

Analogy

37

PARIS:FRANCE::BEIJING: CHINA

France

Paris

China

Beijing

Slidecredit:ScottReed

VisualAnalogyMaking

38

: : : :

: : : :

: : : :

Changingcolor

Changingshape

Changingsize

: : : : ?


VisualAnalogyMaking

• Concept

§ Learnsanencoderfunctionb:ℝ¢ → ℝ§ mappingimagesintoaspace,

whereanalogiescanbeperformed

§ Learnsadecoderï:ℝ§ → ℝ¢ mappingbacktotheimagespace

39

Infer Relationship Transform query

[Reed15]S.Reed,Y.Zhang,Y.Zhang,H.Lee: DeepVisualAnalogyMaking.NIPS2015

` = argmax•∈¶

cos b S , b k − b © + b ™

Architecture

40

ℒD¨¨ = argmax≠,Æ,�,Ç ∈Ø

` − ï b k − b © + b ™R

R

ℒ∞±E = argmax≠,Æ,�,Ç ∈Ø

` − ï b ™ +≤×j b k − b © ×Rb ™ R

R

ℒ¨II≥ = argmax≠,Æ,�,Ç ∈Ø

` − ï b ™ + ℎ b k − b © ; b ™R

R

Optimization

• Regularization

§ Foraccurateanalogycompletionbyimagemanifoldtraversing

§ Makingtransformationmatchthedifferenceofencoderembeddings

• Training

§ WithbackpropagationusingSGD

§ Combinedloss:ℒ + F¥,F = 0.01

41

¥ = P b ` − b ™ − ú b © , b k , b ™R

R?

≠,Æ,�,Ç ∈Ø

ú i, e, µ = _e − i forℒD¨¨

≤×j e − i ×Rµ forℒ∞±EMLP e − i; µ forℒ¨II≥

Given images a, b, c, and N (# steps)z f(c)

for i = 1 to N doz z + T (f(a), f(b), z)

xi g(z)

return generated images xi (i = 1, ..., N )

Algorithm 1: Manifold traversal by analogy,i h f i f i T (E 5)

Optimization

• Training

§ Withbackpropagation

usingSGD

§ Combinedloss:

ℒ + F¥,F = 0.01

• Regularization

§ Foraccurateanalogycompletionbyimagemanifoldtraversing

§ Makingtransformationmatchthedifferenceofencoderembeddings

42

¥ = P b ` − b ™ − ú b © , b k , b ™R

R?

≠,Æ,�,Ç ∈Ø

ú i, e, µ = _e − i forℒD¨¨

≤×j e − i ×Rµ forℒ∞±EMLP e − i; µ forℒ¨II≥

Given images a, b, c, and N (# steps)z f(c)

for i = 1 to N doz z + T (f(a), f(b), z)

xi g(z)

return generated images xi (i = 1, ..., N )

Algorithm 1: Manifold traversal by analogy,i h f i f i T (E 5)

ShapePredictions:AdditiveModel

43

rotate

scale

shift

ref out query t=1predictions

t=2 t=3 t=4

ShapePredictions:MultiplicativeModel

44

rotate

scale

shift


t=2 t=3 t=4

ShapePredictions:DeepModel

45

rotate

scale

shift


t=2 t=3 t=4

ResultsforAnalogyModels

• Transformingshapes

46

Model Rotation steps Scaling steps Translation steps1 2 3 4 1 2 3 4 1 2 3 4

Ladd 8.39 11.0 15.1 21.5 5.57 6.09 7.22 14.6 5.44 5.66 6.25 7.45Lmul 8.04 11.2 13.5 14.2 4.36 4.70 5.78 14.8 4.24 4.45 5.24 6.90Ldeep 1.98 2.19 2.45 2.87 3.97 3.94 4.37 11.9 3.84 3.81 3.96 4.61

ref +rot (gt) query +rot +rot +rot +rot

ref +scl (gt) query +scl +scl +scl +scl

ref +trans (gt) query +trans +trans +trans +trans

LearningDisentangledFeatures

• Objectivefunction

47

ℒ¨πL = argmax≠,Æ,� ∈∫

™ − ï ª ⋅ b © + 1 − ª b kR

R

Algorithm 2: Disentangling training update. Theswitches s determine which units from f(a) andf(b) are used to reconstruct image c.Given input images a, b and target cGiven switches s 2 {0, 1}Kz s · f(a) + (1− s) · f(b)✓ / @/@✓

(||g(z)− c||22

)

DisentanglingandAnalogyMaking

48

Identity

Posea

b

c

Identity

Pose

Identity

Posed

Increment function T


Disentanglingidentity

ClassificationandAnalogyMaking

49Slidecredit:ScottReed

Identity

Pose

Attribute classifier

a

b

c

Identity

Pose

Identity

Posed

Increment function T

Separateclassificationforidentity

ResultsforDisentangledFeatures

• Transferringanimation

§ Disentanglingposefromidentity

§ Posetransformationsaremodeledbydeepadditiveinteractions

50

Model spellcast thrust walk slash shoot averageLadd 41.0 53.8 55.7 52.1 77.6 56.0Ldis 40.8 55.8 52.6 53.5 79.8 56.5

Ldis+cls 13.3 24.6 17.2 18.9 40.8 23.0

ResultsforExtrapolation

51

ref output query +1 +2 +3 +4-4 -3 -2 -1

ref. output query predictions

walk

thrust

rotate

Summary

• Proposingnoveldeeparchitecturesthatcanperformvisual

analogymakingbysimpleoperationsinanembeddingspace

§ Convolutionalencoder-decodernetworks

§ Modelingtransformationsbyvectoradditioninembeddingspaceworks

forsimpleproblems,butmulti-layerneuralnetworksarebetter.

• Combininganalogyanddisentanglingtrainingmethods

§ Analogyrepresentationscanovercomelimitationsofdisentangled

representationsbylearningtransformationmanifold.

52

lecture 10: other applications of...

Documents