adapted deep embeddings: a synthesis of methods for -shot ...04-15-30)-04...(ustinova&...

Adapted Deep Embeddings: A Synthesis of Methods for !-Shot

Inductive Transfer Learning

Tyler R. Scott1,2, Karl Ridgeway1,2, Michael C. Mozer1,3

1 University of Colorado, Boulder2 Sensory Inc.

3 Presently at Google Brain


ModelTarget Domain Input

Target Domain Prediction

SourceDomain

Data

Target Domain

Data

Weight Transfer

Retrain output

Adapt weightsto target domain


Yosinski et al. (2014)

Source Domain

…

Target Domain

…

Weight TransferDeep Metric Learning


Source Domain

…

Source & TargetDomain Embedding Histogram loss

(Ustinova & Lempitsky, 2016)

Distance

Within class

Betweenclass

Weight TransferDeep Metric LearningFew-Shot Learning


Source Domain

…

Source & TargetDomain Embedding

Prototypical nets(Snell et al., 2017)

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Tyler Scott

Weight TransferDeep Metric LearningFew-Shot Learning


Adapted DeepEmbeddings

1. Train network usingembedding loss• Histogram loss,

Prototypical nets2. Adapt weights using

limited target-domaindata

Source Domain

…

Target Domain

…

Why hasn’t a comparison been explored?


# labeled examples per target class

(k)Weight Transfer > 100Deep Metric Learning agnosticFew-Shot Learning < 20

Target Domain

k labeled examples per

class

Source Domain

2200 labeled examples per

class

MNIST

1 5 10 50 100 500 1000k

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Te

st A

ccur

acy

MNIST, n = 5

Baseline

Targ

et D

omai

nTe

st A

ccur

acy

Labeled Examples in Target Domain (k)

MNIST

1 5 10 50 100 500 1000k

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Te

st A

ccur

acy

MNIST, n = 5

Weight AdaptationBaseline

Targ

et D

omai

nTe

st A

ccur

acy


MNIST

1 5 10 50 100 500 1000k

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Te

st A

ccur

acy

MNIST, n = 5

Prototypical NetWeight AdaptationBaseline

Targ

et D

omai

nTe

st A

ccur

acy


MNIST

1 5 10 50 100 500 1000k

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Te

st A

ccur

acy

MNIST, n = 5

Histogram LossPrototypical NetWeight AdaptationBaseline

Targ

et D

omai

nTe

st A

ccur

acy


MNIST

1 5 10 50 100 500 1000k

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0Te

st A

ccur

acy

MNIST, n = 5

Adapted Histogram LossAdapted Prototypical NetHistogram LossPrototypical NetWeight AdaptationBaseline

Targ

et D

omai

nTe

st A

ccur

acy


MNIST

1 10 50 100 200k

0.20.30.40.50.60.70.80.91.0

Isolet, n = 5

1 10 50 100 200k

0.20.30.40.50.60.70.80.91.0

Isolet, n = 10

5 10 100 1000n

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Test

Acc

urac

y

Omniglot, k = 1

5 10 100 1000n

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Omniglot, k = 5

5 10 100 1000n

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Omniglot, k = 10

1 10 50 100 300k

0.2

0.3

0.4

0.5

0.6

0.7

Test

Acc

urac

ytinyImageNet, n = 5

1 10 50 100 300k

0.1

0.2

0.3

0.4

0.5

0.6tinyImageNet, n = 10

1 10 50 100 300k

0.0

0.1

0.2tinyImageNet, n = 50

1 10 50 100 200k

0.20.30.40.50.60.70.80.91.0

Isolet, n = 5

1 10 50 100 200k

0.20.30.40.50.60.70.80.91.0

Isolet, n = 10

1 5 10 50 100 500 1000k

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Test

Acc

urac

y

MNIST, n = 5

Adapted Histogram LossAdapted Prototypical NetHistogram LossPrototypical NetWeight AdaptationBaseline

Conclusion

• Weight transfer is the least effective method for inductive transfer learning

• Histogram loss is robust regardless of the amount of labeled data in the target domain

• Adapted embeddings outperform every static embedding method previously proposed

Poster #167Room 210 & 230 AB

Today, 5:00 - 7:00 PM

adapted deep embeddings: a synthesis of methods for -shot ...04-15-30)-04...(ustinova&...

Documents