fashion classification and detection using convolutional...
Post on 29-May-2020
15 Views
Preview:
TRANSCRIPT
FashionClassificationandDetectionUsingConvolutionalNeuralNetworks
Prateek Srivastava&Archit Arora
November10,2016
FashionClassificationProblem
• Given an image, identify the apparel a particular person is wearing
Applications
• Targetedads
• Productrecommendations
• IdentificationofsuspectsthroughCCTVfootage
• Brandadoptiontrendforclothingcompanies
Challenges
• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter
Challenges
• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter
Challenges
• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter
Challenges
• Intra-classvariation• Viewpointandlocationalvariation• Scalevariation• Backgroundclutter
Overviewoffashionclassificationprocess
• Datacollection• Dataaugmentation• Learningfromdata• Imageclassification&objectdetection
Datacollection
• UsedScrapy tooltosearchandcrawlMacy’swebsite• Generatedalabelleddatasetofaround3500imagesbelongingtothesecategories
Jacket Long-dress Poloshirts Suit Short-dress
T-shirt Sweater Trousers Shoes Robe
Shirt Jeans Swimsuit Shorts
14 Fashion Categories
Dataaugmentation/preprocessing
• Implementedtoimproverobustnessoftheclassifier• Effectsintroducedtoaccountfordeviationsfromtrainingdataset:
1. Backgroundnoise(whitebackgroundconvertedtonoisybackground)2. Randomjitter(Randomjittercontrast:ApplyPCAtoallpixelsintraining
set,samplea”coloroffset”alongprincipalcomponentandthenaddoffsettoallpixels)
3. Rotation4. Contrastchange5. Croptotacklescalevariances6. Resizingtotackleresolutionvariances7. Imagenormalization
(subtractingthemeanfromrespectiveRGBchannels)
• scipy,scikit-imagewereusedtoaddtheaboveeffects
HiddenLayer-1 HiddenLayer-2
OutputLayer
InputLayer(pixelsofeachimage)
Learningfromdata:Multi-layerperceptron
HiddenLayer-1 HiddenLayer-2
OutputLayer
InputLayer(pixelsofeachimage)
w11
w12
w13
w14
Learningfromdata:Multi-layerperceptron
w11
w12
w13
w14
HiddenLayer-1 HiddenLayer-2
OutputLayer
InputLayer
w11
w31
w21
Learningfromdata:Multi-layerperceptron
Non-linearactivationfunctiontoidentify
complexdatapatterns.Eg:f(x)=max(0,x)(ReLu)
w11
w12
w13
w14
w11
w31
w21
w11
w12
w13
w14
HiddenLayer-1 HiddenLayer-2
OutputLayer
InputLayer
Learningfromdata:Multi-layerperceptron
w11
w12
w13
w14
w11
w31
w21
w11
w12
w13
w14
HiddenLayer-1 HiddenLayer-2
OutputLayer
InputLayer
w11
w12
w13
w14
Learningfromdata:Multi-layerperceptron
w11
w12
w13
w14
w11
w31
w21
w11
w12
w13
w14
w11
w12
w13
w14
HiddenLayer-1 HiddenLayer-2
OutputLayer
InputLayer
w11
w12
w13
w14
Learningfromdata:Multi-layerperceptron
Softmax functiontoassign
probabilitiestoclasses.
Learningfromdata:ConvolutionalNeuralNetworks(CNN)
[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]
Imageinputasvolume
MainIssue:Multi-LayerPerceptrondoesnotscalewelltoimagedata.UseCNNs!
Learningfromdata:ConvolutionalNeuralNetworks(CNN)
Depthoffilteristhesamethedepthofinputvolume
[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]
Learningfromdata:ConvolutionalNeuralNetworks(CNN)
[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]
Evaluatesimagesimilaritywithfiltertemplate
Learningfromdata:ConvolutionalNeuralNetworks(CNN)
[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]
Non-linearactivation
functionlikeReLu
Learningfromdata:ConvolutionalNeuralNetworks(CNN)
[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]
Non-linearactivation
functionlikeReLu
Learningfromdata:ConvolutionalNeuralNetworks(CNN)
[SlidestakenfromLec-7CS231NConvolutionalNeuralNetworksforComputerVision]
StepsinvolvedinTrainingData
• DefineLossfunctionforminimization:Cross-entropyorNegativeLikelihoodused
𝐿 𝒙, 𝒚,𝑾 =1𝑛))𝐼{,-./}
�
/∈3
log 𝑃(𝑌: = 𝑐|=
:.>
𝑥:) + 𝝀| 𝑾 |𝟐
• Usemini-batchSGDtominimizeloss• Initializeweights𝑾𝟎forallthelayers• Updaterule:
𝑾(𝒌G𝟏) = 𝑾(𝒌) − 𝒔𝒌(𝜵𝑾𝑳(𝑾))
Gradient(Difficulttocomputedirectly)
GradientComputationsbasedonComputationalGraphs
2
1
-1
0.73𝟏
𝒑 = 𝒘𝟎𝒙𝟎
𝒒 = 𝒑+𝒘𝟏
𝒛 =𝟏
𝟏+ 𝒆R𝒒 =
𝟐
𝑓 𝑥T,𝑤T,𝑤> =1
1 + 𝑒R(WXYXGWZ)Gradientof (2,1,1)at
ForwardPropagation
GradientComputationsbasedonComputationalGraphs
2
1
-1
0.196
0.196
0.196
0.392
0.196
𝒑 = 𝟐
𝒛 = 0.73𝒒 = 𝟏
𝒅𝒛𝒅𝒒 =
𝒅𝒛𝒅𝒘𝟏
=𝒅𝒛𝒅𝒒
𝒅𝒒𝒅𝒘𝟏
=
𝒅𝒛𝒅𝒑 =
𝒅𝒛𝒅𝒒
𝒅𝒒𝒅𝒑
𝒅𝒛𝒅𝒙𝟎
=𝒅𝒛𝒅𝒑
𝒅𝒑𝒅𝒙𝟎
𝒅𝒛𝒅𝒘𝟎
=𝒅𝒛𝒅𝒑
𝒅𝒑𝒅𝒘𝟎
𝑓 𝑥T,𝑤T,𝑤> =1
1 + 𝑒R(WXYXGWZ)Gradientof (2,1,1)at
MainIdea:Calculateglobalgradientsusingchainruleonlocalgradients
BackwardPropagation
UsingPre-trainedmodels
• Inpractice,trainingCNNsfromscratchishardlyusedduetocomputationalconsiderations• Instead,pre-trainedmodelstrainedonotherdatasetareused• Mainidea:Lowlevelfeaturesarecommontoallimageclassificationproblems
• ForourprojectweusedVGG-16modeltrainedforImageNetclassificationchallengeon1000categories
VGG-16Architecture
• Input: 224x224 size images with 3 channels1. Convolutional (224x224x64)2. Convolutional (224x224x64)Max Pool (112x112x64)3. Convolutional (112x112x128)4. Convolutional (112x112x128)Max Pool (56x56x128)5. Convolutional (56x56x256)6. Convolutional (56x56x256)7. Convolutional (56x56x256)Max Pool (28x28x256)8. Convolutional (28x28x512)9. Convolutional (28x28x512)10. Convolutional (28x28x512)Max Pool (14x14x512)
11. Convolutional (14x14x512)12. Convolutional (14x14x512)13. Convolutional (14x14x512)Max Pool (7x7x512)14. Fully Connected (1x1x4096)15. Fully Connected (1x1x4096)16. Fully Connected (1x1x1000) Soft-max (1x1x1000)
Reducesspatialdimensions
Implementation:ModifiedVGG-16
• Input: 224x224 size images with 3 channels1. Convolutional (224x224x64)2. Convolutional (224x224x64)Max Pool (112x112x64)3. Convolutional (112x112x128)4. Convolutional (112x112x128)Max Pool (56x56x128)5. Convolutional (56x56x256)6. Convolutional (56x56x256)7. Convolutional (56x56x256)Max Pool (28x28x256)8. Convolutional (28x28x512)9. Convolutional (28x28x512)10. Convolutional (28x28x512)Max Pool (14x14x512)
11. Convolutional (14x14x512)12. Convolutional (14x14x512)13. Convolutional (14x14x512)Max Pool (7x7x512)14. Fully Connected (1x1x4096)15. Fully Connected (1x1x4096)16. Fully Connected (1x1x1000) Soft-max (1x1x1000)
Treatthepreviouslayers(weights)asfixedandtunetheparametersonthelast3fullyconnectedlayers
Implementation:ModifiedVGG-16
• Input: 224x224 size images with 3 channels1. Convolutional (224x224x64)2. Convolutional (224x224x64)Max Pool (112x112x64)3. Convolutional (112x112x128)4. Convolutional (112x112x128)Max Pool (56x56x128)5. Convolutional (56x56x256)6. Convolutional (56x56x256)7. Convolutional (56x56x256)Max Pool (28x28x256)8. Convolutional (28x28x512)9. Convolutional (28x28x512)10. Convolutional (28x28x512)Max Pool (14x14x512)
11. Convolutional (14x14x512)12. Convolutional (14x14x512)13. Convolutional (14x14x512)Max Pool (7x7x512)14. Fully Connected (1x1x4096)15. Fully Connected (1x1x4096)16. Fully Connected (1x1x1000) Soft-max (1x1x1000)
ImplementedCNNarchitectureusingTheano(library)andLasagne(wrapper)withCUDAGPUonAWSEC2server
Cross-entropylossversusnumberofiterationsusingNesterov momentumupdaterule
ResultsusingMacy’sdata
• Initially,resultswereobservedusingthetrain-testsplitwithinMacy’s
data(withoutdataaugmentation)
• ~92%accuracywasachieved
ResultsusingMacy’sdata
TrainingData TestData
ResultsusingMacy’sdata
• Resultswereexceptionallybad(~20%accuracy)whenusingthesametrainingdatatopredictatestdataofreallifeexamplesfromGoogleImages• Hence,therewasaneedfordataaugmentation
Resultsfortestdataafterdataaugmentation
• Thetestdataconsistedofaround60imagesfromGoogleImages
comprisingofrealworldphotos
• ~71%accuracywasachieved
Cross-entropylossversusnumberofiterationsusingNesterov momentumupdaterule
Resultsfortestdataafterdataaugmentation
NextSteps…
• Extendanalysistomultipleclothingobjectdetection• Approach:UseaslidingwindowapproachwithfastlinearclassifierstoidentifyregionsofinterestandapplyCNNs• DeterminespecificobjectlocationusingregressionheadwithaL2lossfunctioninsteadofclassificationheadwithcross-entropyloss
THANKYOU
Label Category
0 Jacket
1 T-Shirt
2 Shirt
3 Jeans
4 Long-dress
5 Shoes
6 Trousers
7 Polo-shirts
8 Robe
9 Short-dress
10 Shorts
11 Suit
12 Sweater
13 Swimsuit
ResultsusingMacy’sdata
top related