end-to-end localization and ranking for relative attributesyjlee/teaching/ecs289g... · our idea:...

25
End-to-End Localization and Ranking for Relative Attributes Krishna Kumar Singh and Yong Jae Lee Presented by Minhao Cheng

Upload: others

Post on 30-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

End-to-End Localization and Ranking

for Relative Attributes

Krishna Kumar Singh and Yong Jae Lee

Presented by Minhao Cheng

Page 2: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Visual attributes

High heel SmileMountainousCozy

[Farhadi et al. 2009, Kumar et al. 2009, Lampert et al. 2009,

Berg et al. 2010, Rastegari et al. 2012, …][Slide: Xiao and Lee, ICCV 2015]

Page 3: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Relative attributes

Is she smiling? Hard to say... Lot easier to say "the right

one is more smiling"

<

[Parikh & Grauman 2011, Shrivastava et al. 2012,

Kovashka et al. 2013, Sandeep et al. 2014, …]

[Slide: Xiao and Lee, ICCV 2015]

Page 4: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Localization of attributes

Spatial regions that are most relevant to a particular attribute

Smile

MountainousCozy

[Slide: Xiao and Lee, ICCV 2015]

Page 5: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Prior work on localizing attributes

• Attribute localization with human-in-the-loop: [Duan et al. 2012]

• Attribute localization with pre-trained detectors: [Bourdev et al. 2011, Zhang et al. 2014, Sandeep et al. 2014]

• Attribute localization with binary attributes: [Berg et

al. 2010, Bourdev et al. 2011, Duan et al. 2012, Zhang et al. 2014]

Requires strong human supervision

or binary attribute annotations

[Slide: Xiao and Lee, ICCV 2015]

Page 6: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Prior work on localizing attributes

“Pipeline” where features, localizer, and

classifier are trained separately and

sequentially; suboptimal and slow

• Attribute localization in weakly-supervised setting: [Xiao and Lee, ICCV 2015]

[Slide: Xiao and Lee, ICCV 2015]

Page 7: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Our idea: jointly learn features, localizer, and ranker end-to-end using deep network

End-to-end network for attribute localization and ranking

[Singh and Lee, ECCV 2016]

Page 8: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Our idea: jointly learn features, localizer, and classifier end-to-end using deep network

End-to-end network for attribute localization and ranking

Attribute: Smile

Training pairs

Training

[Singh and Lee, ECCV 2016]

Page 9: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Our idea: jointly learn features, localizer, and classifier end-to-end using deep network

End-to-end network for attribute localization and ranking

Attribute: Smile

Training pairs

Weak Strong

Testing

Training

Test images

Page 10: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Overview of our end-to-end approach

[Singh and Lee, ECCV 2016]

V1

Loss Function

V2

Localization

Network

Ranker

Network

Siamese Network (S1)

Localization

Network

Ranker

Network

Siamese Network (S2)

I1

I2

Attribute: Smile

• Goal: Given pairs of ordered training images, simultaneously localize attribute in each image and learn a ranker

Page 11: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Our end-to-end approach

I

96256

384 384 384 128 3128

θ Grid

generator

Ranker Network

V

96256

384 384 384 4096 4096

8192

1

Localization Network

[Singh and Lee, ECCV 2016]

Page 12: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Our end-to-end approach

I

96256

384 384 384 128 3128

θ Grid

generator

Localization Network

• Localization network discovers the region-of-interest for the attribute

• Learn transformation parameters mapping input to output

• Spatial Transformer Networks [Jaderberg et al. 2014]

[Singh and Lee, ECCV 2016]

Page 13: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Our end-to-end approach

I

96256

384 384 384 128 3128

θ Grid

generator

Ranker Network

V

96256

384 384 384 4096 4096

8192

1

Localization Network

• Ranker network takes the localized region to produce a ranking score

• Combine the global image for global context

[Singh and Lee, ECCV 2016]

Page 14: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

V1

Loss Function

I1

V2I2

Localization

Network

Ranker

Network

Siamese (S1)

Localization

Network

Ranker

Network

Siamese (S2)

Training

• Cross entropy:

Attribute: Smile

[Singh and Lee, ECCV 2016]

Page 15: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

V1

Loss Function

I1

V2I2

Localization

Network

Ranker

Network

Siamese (S1)

Localization

Network

Ranker

Network

Siamese (S2)

Training

• Localized region can fall outside image bounds making learning difficult

Attribute: Smile

[Singh and Lee, ECCV 2016]

Page 16: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

V1

Loss Function

I1

V2I2

Localization

Network

Ranker

Network

Siamese (S1)

Localization

Network

Ranker

Network

Siamese (S2)

Training

• Optimized using backpropagation, mini-batch Stochastic Gradient Descent

Attribute: Smile

[Singh and Lee, ECCV 2016]

Page 17: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Attribute:

Smile

Attribute: Dark hair

Training epochs

• Heatmap: distribution of localized region across entire training dataset

Progression of localized region over training epochs

[Singh and Lee, ECCV 2016]

Page 18: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

VtestLocalization

NetworkRanker Network

Siamese (S1)

Testing

• Localize the relevant attribute region

• Produce a ranking score for the test image

Test image

[Singh and Lee, ECCV 2016]

Page 19: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Experiments: Relative attribute datasets

Visible teeth, Eyes open, Dark hair, Smile, Good looking...

Pointy, Open, Sporty, Comfort

LFW-10 (2k images)[Sandeep et al., CVPR 2014]

UTZappos50k (50k images)[Yu & Grauman, CVPR 2014]

[Singh and Lee, ECCV 2016]

Page 20: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Results: Discovered regions and ranking on LFW-10 FacesWeak Strong

Bald

Dark

hair

Eyes

open

• Our network discovers relevant attribute regions

• Leads to accurate rankings

Smile

Page 21: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Weak Strong

Masculine

Good

looking

• Global attributes are harder to interpret

• Focus more on larger areas

Results: Discovered regions and ranking on LFW-10 Faces

Young

[Singh and Lee, ECCV 2016]

Page 22: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Weak Strong

Open

Pointy

Sporty

Comfort

Results: Discovered regions and ranking UT-Zap50K Shoes

[Singh and Lee, ECCV 2016]

Page 23: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Results: Image pair ranking accuracy

• % of test image pairs whose predicted relative attribute ranking is correct

• State-of-the-art results on LFW-10, UT-Zap50K, OSR, Shoe-with-Attribute

Combing global image context w/ localized fine-grained information performs best

[Singh and Lee, ECCV 2016]

Page 24: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Conclusions

• Novel end-to-end network for ranking and localizing attributes.

• State-of-the-art performance on the attribute ranking performance on benchmark face, shoe, and outdoor scene datasets.

• Our Our approach is 100 times faster than [Xiao & Lee].

Page 25: End-to-End Localization and Ranking for Relative Attributesyjlee/teaching/ecs289g... · Our idea: jointly learn features, localizer, ... •Ranker network takes the localized region

Question

• What if we can use multiple localization network instead of one to help to get a better performance? (like we can use the eye’s feature to help ranking the smile attribute as well)