fashion classification and detection using convolutional...

Post on 29-May-2020

15 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FashionClassificationandDetectionUsingConvolutionalNeuralNetworks

Prateek Srivastava&Archit Arora

November10,2016

FashionClassificationProblem

• Given an image, identify the apparel a particular person is wearing

Applications

• Targetedads

• Productrecommendations

• IdentificationofsuspectsthroughCCTVfootage

• Brandadoptiontrendforclothingcompanies

Challenges

• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter

Challenges

• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter

Challenges

• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter

Challenges

• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter

Overviewoffashionclassificationprocess

• Datacollection• Dataaugmentation• Learningfromdata• Imageclassification&objectdetection

Datacollection

• UsedScrapy tooltosearchandcrawlMacy’swebsite• Generatedalabelleddatasetofaround3500imagesbelongingtothesecategories

Jacket Long-dress Poloshirts Suit Short-dress

T-shirt Sweater Trousers Shoes Robe

Shirt Jeans Swimsuit Shorts

14 Fashion Categories

Dataaugmentation/preprocessing

• Implementedtoimproverobustnessoftheclassifier• Effectsintroducedtoaccountfordeviationsfromtrainingdataset:

1. Backgroundnoise(whitebackgroundconvertedtonoisybackground)2. Randomjitter(Randomjittercontrast:ApplyPCAtoallpixelsintraining

set,samplea”coloroffset”alongprincipalcomponentandthenaddoffsettoallpixels)

3. Rotation4. Contrastchange5. Croptotacklescalevariances6. Resizingtotackleresolutionvariances7. Imagenormalization

(subtractingthemeanfromrespectiveRGBchannels)

• scipy,scikit-imagewereusedtoaddtheaboveeffects

HiddenLayer-1 HiddenLayer-2

OutputLayer

InputLayer(pixelsofeachimage)

Learningfromdata:Multi-layerperceptron

HiddenLayer-1 HiddenLayer-2

OutputLayer

InputLayer(pixelsofeachimage)

w11

w12

w13

w14

Learningfromdata:Multi-layerperceptron

w11

w12

w13

w14

HiddenLayer-1 HiddenLayer-2

OutputLayer

InputLayer

w11

w31

w21

Learningfromdata:Multi-layerperceptron

Non-linearactivationfunctiontoidentify

complexdatapatterns.Eg:f(x)=max(0,x)(ReLu)

w11

w12

w13

w14

w11

w31

w21

w11

w12

w13

w14

HiddenLayer-1 HiddenLayer-2

OutputLayer

InputLayer

Learningfromdata:Multi-layerperceptron

w11

w12

w13

w14

w11

w31

w21

w11

w12

w13

w14

HiddenLayer-1 HiddenLayer-2

OutputLayer

InputLayer

w11

w12

w13

w14

Learningfromdata:Multi-layerperceptron

w11

w12

w13

w14

w11

w31

w21

w11

w12

w13

w14

w11

w12

w13

w14

HiddenLayer-1 HiddenLayer-2

OutputLayer

InputLayer

w11

w12

w13

w14

Learningfromdata:Multi-layerperceptron

Softmax functiontoassign

probabilitiestoclasses.

Learningfromdata:ConvolutionalNeuralNetworks(CNN)

[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]

Imageinputasvolume

MainIssue:Multi-LayerPerceptrondoesnotscalewelltoimagedata.UseCNNs!

Learningfromdata:ConvolutionalNeuralNetworks(CNN)

Depthoffilteristhesamethedepthofinputvolume

[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]

Learningfromdata:ConvolutionalNeuralNetworks(CNN)

[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]

Evaluatesimagesimilaritywithfiltertemplate

Learningfromdata:ConvolutionalNeuralNetworks(CNN)

[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]

Non-linearactivation

functionlikeReLu

Learningfromdata:ConvolutionalNeuralNetworks(CNN)

[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]

Non-linearactivation

functionlikeReLu

Learningfromdata:ConvolutionalNeuralNetworks(CNN)

[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]

StepsinvolvedinTrainingData

• DefineLossfunctionforminimization:Cross-entropyorNegativeLikelihoodused

𝐿 𝒙, 𝒚,𝑾 =1𝑛))𝐼{,-./}

/∈3

log 𝑃(𝑌: = 𝑐|=

:.>

𝑥:) + 𝝀| 𝑾 |𝟐

• Usemini-batchSGDtominimizeloss• Initializeweights𝑾𝟎forallthelayers• Updaterule:

𝑾(𝒌G𝟏) = 𝑾(𝒌) − 𝒔𝒌(𝜵𝑾𝑳(𝑾))

Gradient(Difficulttocomputedirectly)

GradientComputationsbasedonComputationalGraphs

2

1

-1

0.73𝟏

𝒑 = 𝒘𝟎𝒙𝟎

𝒒 = 𝒑+𝒘𝟏

𝒛 =𝟏

𝟏+ 𝒆R𝒒 =

𝟐

𝑓 𝑥T,𝑤T,𝑤> =1

1 + 𝑒R(WXYXGWZ)Gradientof (2,1,1)at

ForwardPropagation

GradientComputationsbasedonComputationalGraphs

2

1

-1

0.196

0.196

0.196

0.392

0.196

𝒑 = 𝟐

𝒛 = 0.73𝒒 = 𝟏

𝒅𝒛𝒅𝒒 =

𝒅𝒛𝒅𝒘𝟏

=𝒅𝒛𝒅𝒒

𝒅𝒒𝒅𝒘𝟏

=

𝒅𝒛𝒅𝒑 =

𝒅𝒛𝒅𝒒

𝒅𝒒𝒅𝒑

𝒅𝒛𝒅𝒙𝟎

=𝒅𝒛𝒅𝒑

𝒅𝒑𝒅𝒙𝟎

𝒅𝒛𝒅𝒘𝟎

=𝒅𝒛𝒅𝒑

𝒅𝒑𝒅𝒘𝟎

𝑓 𝑥T,𝑤T,𝑤> =1

1 + 𝑒R(WXYXGWZ)Gradientof (2,1,1)at

MainIdea:Calculateglobalgradientsusingchainruleonlocalgradients

BackwardPropagation

UsingPre-trainedmodels

• Inpractice,trainingCNNsfromscratchishardlyusedduetocomputationalconsiderations• Instead,pre-trainedmodelstrainedonotherdatasetareused• Mainidea:Lowlevelfeaturesarecommontoallimageclassificationproblems

• ForourprojectweusedVGG-16modeltrainedforImageNetclassificationchallengeon1000categories

VGG-16Architecture

• Input: 224x224 size images with 3 channels1. Convolutional (224x224x64)2. Convolutional (224x224x64)Max Pool (112x112x64)3. Convolutional (112x112x128)4. Convolutional (112x112x128)Max Pool (56x56x128)5. Convolutional (56x56x256)6. Convolutional (56x56x256)7. Convolutional (56x56x256)Max Pool (28x28x256)8. Convolutional (28x28x512)9. Convolutional (28x28x512)10. Convolutional (28x28x512)Max Pool (14x14x512)

11. Convolutional (14x14x512)12. Convolutional (14x14x512)13. Convolutional (14x14x512)Max Pool (7x7x512)14. Fully Connected (1x1x4096)15. Fully Connected (1x1x4096)16. Fully Connected (1x1x1000) Soft-max (1x1x1000)

Reducesspatialdimensions

Implementation:ModifiedVGG-16

• Input: 224x224 size images with 3 channels1. Convolutional (224x224x64)2. Convolutional (224x224x64)Max Pool (112x112x64)3. Convolutional (112x112x128)4. Convolutional (112x112x128)Max Pool (56x56x128)5. Convolutional (56x56x256)6. Convolutional (56x56x256)7. Convolutional (56x56x256)Max Pool (28x28x256)8. Convolutional (28x28x512)9. Convolutional (28x28x512)10. Convolutional (28x28x512)Max Pool (14x14x512)

11. Convolutional (14x14x512)12. Convolutional (14x14x512)13. Convolutional (14x14x512)Max Pool (7x7x512)14. Fully Connected (1x1x4096)15. Fully Connected (1x1x4096)16. Fully Connected (1x1x1000) Soft-max (1x1x1000)

Treatthepreviouslayers(weights)asfixedandtunetheparametersonthelast3fullyconnectedlayers

Implementation:ModifiedVGG-16

• Input: 224x224 size images with 3 channels1. Convolutional (224x224x64)2. Convolutional (224x224x64)Max Pool (112x112x64)3. Convolutional (112x112x128)4. Convolutional (112x112x128)Max Pool (56x56x128)5. Convolutional (56x56x256)6. Convolutional (56x56x256)7. Convolutional (56x56x256)Max Pool (28x28x256)8. Convolutional (28x28x512)9. Convolutional (28x28x512)10. Convolutional (28x28x512)Max Pool (14x14x512)

11. Convolutional (14x14x512)12. Convolutional (14x14x512)13. Convolutional (14x14x512)Max Pool (7x7x512)14. Fully Connected (1x1x4096)15. Fully Connected (1x1x4096)16. Fully Connected (1x1x1000) Soft-max (1x1x1000)

ImplementedCNNarchitectureusingTheano(library)andLasagne(wrapper)withCUDAGPUonAWSEC2server

Cross-entropylossversusnumberofiterationsusingNesterov momentumupdaterule

ResultsusingMacy’sdata

• Initially,resultswereobservedusingthetrain-testsplitwithinMacy’s

data(withoutdataaugmentation)

• ~92%accuracywasachieved

ResultsusingMacy’sdata

TrainingData TestData

ResultsusingMacy’sdata

• Resultswereexceptionallybad(~20%accuracy)whenusingthesametrainingdatatopredictatestdataofreallifeexamplesfromGoogleImages• Hence,therewasaneedfordataaugmentation

Resultsfortestdataafterdataaugmentation

• Thetestdataconsistedofaround60imagesfromGoogleImages

comprisingofrealworldphotos

• ~71%accuracywasachieved

Cross-entropylossversusnumberofiterationsusingNesterov momentumupdaterule

Resultsfortestdataafterdataaugmentation

NextSteps…

• Extendanalysistomultipleclothingobjectdetection• Approach:UseaslidingwindowapproachwithfastlinearclassifierstoidentifyregionsofinterestandapplyCNNs• DeterminespecificobjectlocationusingregressionheadwithaL2lossfunctioninsteadofclassificationheadwithcross-entropyloss

THANKYOU

Label Category

0 Jacket

1 T-Shirt

2 Shirt

3 Jeans

4 Long-dress

5 Shoes

6 Trousers

7 Polo-shirts

8 Robe

9 Short-dress

10 Shorts

11 Suit

12 Sweater

13 Swimsuit

ResultsusingMacy’sdata

top related