diversity in fashion recommendation using …17/20 conclusion i using clothing parts for...

20
Diversity in Fashion Recommendation Using Semantic Parsing Sagar Verma 1 , Sukhad Anand 1 , Chetan Arora 1 , Atul Rai 2 1 Department of Computer Science and Engineering Indraprastha Institute of Information Technology, Delhi. 2 Staqu Technologies ICIP, 2018 1/20

Upload: others

Post on 13-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Diversity in Fashion Recommendation UsingSemantic Parsing

Sagar Verma1, Sukhad Anand1, Chetan Arora1, Atul Rai2

1Department of Computer Science and EngineeringIndraprastha Institute of Information Technology, Delhi.

2Staqu Technologies

ICIP, 2018

1/20

Page 2: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Problem Statement

Recommendation based on contextual similarity

Hat

Dress

Bag

Query Image

Relevant Item Images

Images retrieved by finding similarity

between features computed over whole

image.

Images retrieved by finding similarity between features

computed over semantically similar parts of image.

2/20

Page 3: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Contributions

1. Our method recommends similar images based on differentparts of a query image.

2. To identify different parts we use attention and weakly labeleddata.

3. Instead of features from standard pre-trained neural networks,we suggest using texture-based features.

4. We evaluate our method on item recognition task,consumer-to-shop retrieval and in-shop retrieval tasks.

3/20

Page 4: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Related Work

I Cloth parsing,

I Clothing attribute recognition,

I Detecting fashion style, and

I Cross-domain image retrieval using Siamese network andTriplet network.

4/20

Page 5: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Proposed Architecture

Spatial Transformer(ST)

Texture Encoding

LSTM

Input Image

StyleNet

First six convolution

layers

Feature Map(fI)

Mk+1

fI,cint,hint

fIk,ck,hk,Mk+1

Category-wisemax-pooled labels

Texture Encoding

Texture Encoding

fI,M0

fI0

fI1 fI

k

Localization Network, ST-LSTM focuses on different clothing items present in the image at each time step.

Reshape

5/20

Page 6: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

CNN for Global Image Features

Spatial Transformer(ST)

Texture Encoding

LSTM

Mk+1

fI,cint,hint

fIk,ck,hk,Mk+1

Category-wisemax-pooled labels

Texture Encoding

Texture Encoding

fI,M0

fI0

fI1 fI

k

Localization Network, ST-LSTM focuses on different clothing items present in the image at each time step.

Reshape

Input Image

StyleNet

First six convolution

layers

Feature Map(fI)

6/20

Page 7: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Visual Attention Module

Category-wisemax-pooled labels

Texture Encoding

Texture Encoding

Reshape

StyleNet

Texture Encoding

Input Image First six

convolution layers

Spatial Transformer(ST)

LSTM

Mk+1

fI,cint,hint

fIk,ck,hk,Mk+1

fI,M0

fI0

fI1 fI

k

Localization Network, ST-LSTM focuses on different clothing items present in the image at each time step.

Feature Map(fI)

7/20

Page 8: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Texture Encoding Layer

StyleNetInput Image First six

convolution layers

Spatial Transformer(ST)

LSTM

Mk+1

fI,cint,hint

fIk,ck,hk,Mk+1

fI,M0

fI0

fI1 fI

k

Localization Network, ST-LSTM focuses on different clothing items present in the image at each time step.

Feature Map(fI)

Category-wisemax-pooled labels

Texture Encoding

Texture Encoding

Reshape

Texture Encoding

8/20

Page 9: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Multi-label classification loss

Lcls =1

N

N∑i=1

C∑c=1

(pci − p̂ci )2 (1)

where, N is training sample, C is total number of classes, p̂i isground truth label vector of sample i and pi is predicted labelvector of sample i .

9/20

Page 10: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Diversity loss

Diversity loss is the correlation between adjacent attention maps,

Ldiv =1

K − 1

K∑k=2

HxW∑i=1

lk−1,i .lk,i (2)

where, K is the total steps of recurrent attention, HxW is theheight and width of attention maps, lk is the kth attention map.

10/20

Page 11: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Localisation loss

Localization loss, Lloc from [1] is used to remove redundantlocations and force localization network to look at small clothingparts.

11/20

Page 12: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Combined Loss

L = Lcls + λ1Ldiv + λ2Lloc (3)

where λ1 and λ2 are multiplicative factors. We use 0.01 for all ourexperiments.

12/20

Page 13: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Datasets

I Fashion144K [2]I 90,000 images with multilabel annotation.I 128 classes.I Image resolution is 384x256.

I Fashion550K [3]I 66 classes.

I DeepFashion [4]I 800,000 imagesI Similarity pairs is available for consumer-to-shop and in-shop

retrieval tasks

13/20

Page 14: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Experiments

I Model is trained on Fashion144K [2] dataset with 59 itemlabels, color labels were excluded.

I Evaluated item recognition task on Fashion144K [2] andFashion550K [3] dataset.

I Consumer-to-shop and in-shop retrieval tasks are evaluated onDeepFashion [4] dataset.

14/20

Page 15: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Results

Dataset Fashion144k [2] Fashion550k [3]Model APall mAP APall mAP

StyleNet [2] 65.6 48.34 69.53 53.24Baseline [3] 62.23 49.66 69.18 58.68Viet et al. [5] NA NA 78.92 63.08Inoue et al. [3] NA NA 79.87 64.62Ours 82.78 68.38 82.81 57.93

Multi-label classification on Fashion144k [2] and Fashion550k [3]

15/20

Page 16: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Results

0 10 20 30 40 50Retrieved Images(k=1,2,3.....50)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Avera

ge a

ccura

cy

WTBI(0.506)

DARN(0.67)

FashionNet(0.764)

Ours(0.784)

(a) In-Shop retrieval

0 10 20 30 40 50Retrieved Images(k=1,2,3.....50)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Avera

ge a

ccura

cy

WTBI(0.063)

DARN(0.111)

FashionNet(0.188)

Ours(0.253)

(b) Consumer-to-shop retrieval

Retrieval results for In-shop and Consumer-to-shop retrieval taskson DeepFashion dataset [4].

16/20

Page 17: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Results

Dress

Hat

Belt

QueryImage 4

Boots, SkirtBoots, Skirt

Query Image 5

Hat , Bag

Hat , Dress

Hat

Query Image 1

Skirt, BootsDress

Glasses, Dress

QueryImage 3

Dress, Hat, Boots

Hat, Boots Hat, Coat

Boots, Coat

Query Image 6

Shoes Jeans, DressQuery Image 2

Semantically similar results for some of the query images fromFashion144k dataset [2] using our method.

17/20

Page 18: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Conclusion

I Using clothing parts for recommendation gives muchvariability in the recommendation results.

I Attention can be used to learn discriminative features fromweak labels.

I Texture cues are important for learning different parts.

18/20

Page 19: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

References

Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, and Liang Lin,

“Multi-label image recognition by recurrently discovering attentional regions,”in ICCV, 2017.

E. Simo-Serra and H. Ishikawa,

“Fashion style in 128 floats: Joint ranking and classification using weak data for feature extraction,”in CVPR, 2016, pp. 298–307.

Naoto Inoue, Edgar Simo-Serra, Toshihiko Yamasaki, and Hiroshi Ishikawa,

“Multi-label fashion image classification with minimal human supervision,”in ICCVW, 2017, pp. 2261–2267.

Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang,

“Deepfashion: Powering robust clothes recognition and retrieval with rich annotations,”in CVPR, 2016, pp. 1096–1104.

Andreas Veit et al.,

“Learning from noisy large-scale datasets with minimal supervision,”in CVPR, 2017.

19/20

Page 20: Diversity in Fashion Recommendation Using …17/20 Conclusion I Using clothing parts for recommendation gives much variability in the recommendation results. I Attention can be used

Thank youQuestions?

20/20