embeddings and uncertainty · 2019-10-07 · embeddings and uncertainty oct. 4, 2019 andrew...
TRANSCRIPT
Embeddings and UncertaintyOct. 4, 2019
Andrew Gallagher [email protected]
Seong Joon Oh, Joseph Roth, Jiyan Pan, Florian Schroff, Kevin Murphy, Mike Mozer, Caroline Pantofaru, Michael Nechyba, Cusuh Ham, Zhou Zhou, Abinash Behera
Andrew Gallagher
longing for certainty ... is in every human mind. But certainty generally is illusion.
Oliver Wendell Holmes
Bertrand Russell
Andrew Gallagher
Would you recognize this person if you needed to?
Andrew Gallagher
What about one of these?
Bulthoff and Edelman, Psychophysical support for a two-dimensional view interpolation theory of object recognition, 1992. (link)
Andrew Gallagher
How about this?
Bulthoff, Object Recognition in Man and Machine (link)
Andrew Gallagher
How about this?
Bulthoff, Object Recognition in Man and Machine (link)
Andrew Gallagher
Introduction
Embeddings
Uncertainty Representations (2)
Discussion
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Introduction
Embeddings
Uncertainty Representations (2)
Discussion
Andrew Gallagher WNYIPW2019 Google
Anchor
Positive
Negative
Anchor
Positive
NegativeLearn
Margin
FaceNet
slide content: F. Schroff and D. Kalenichenko
Encourages a margin between the ||anchor - positive|| and ||anchor - negative||.
F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering https://arxiv.org/abs/1503.03832
Anchor
Positive
Negative
Anchor
Positive
NegativeLearn
Margin
FaceNet
slide content: F. Schroff and D. Kalenichenko
Training
Training
?
Testing
Testing
Andrew Gallagher
Introduction
Embeddings
Uncertainty Representations (1)
Discussion
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Are two images the same class or not?
We typically employ:
Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher, Modeling Uncertainty with Hedged Instance Embedding, https://arxiv.org/abs/1810.00319
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
class: ?
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
class: 47?
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
class: ?
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
class: 17?
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
class: ?
?
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
class: ?
?
Andrew Gallagher
Commonly done: Point embedding.
true class: 17
true class: 47
class: ?
?
Andrew Gallagher
Hedged Instance Embeddings
true class: 17
true class: 47
class: ?
Andrew Gallagher
true class: 17
true class: 47
class: ?
Andrew Gallagher
Metric embedding as distributions.Formulate as discriminative task: given pair, predict match.
Introduce embedding as latent variable:
Embedding distributions must be:parameterizablesampleable
Andrew Gallagher
Match probability estimation.
Andrew Gallagher
Computing
Andrew Gallagher
Computing
Andrew Gallagher
Computing
Andrew Gallagher
Computing
Andrew Gallagher
sigmoid matchlikelihood
Computing
Andrew Gallagher
Computing
sigmoid matchlikelihood
Scratch Demo
Andrew Gallagher
End-to-end differentiable with reparametrization trick.
Shared params
Andrew Gallagher
MoG embedding with multiple samples case.
Shared params
Andrew Gallagher
Training objective - Variational Information Bottleneck
Derived from the Variational Information Bottleneck [from Bayesflow team]:
Log likelihood term: Binary cross-entropy loss for match prediction.
KL term: Controls the compression level for z. Unit Gaussian.
Andrew Gallagher
We desire a measure that, given input , one can guess its performance on downstream tasks (e.g., verification, recognition).
Uncertainty measure: Self-mismatch.
Andrew Gallagher
uncertain certainsomewhat uncertain
MoG-1 MoG-1MoG-2
Uncertainty measure: Self-mismatch.We desire a measure that, given input , one can guess its performance on downstream tasks (e.g., verification, recognition).
Andrew Gallagher
Dataset and Experiments
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Experimental setup.
Andrew Gallagher
Clean images Corrupt images
2 Digit MNIST → 2 dimension Hedged Instance Embeddings (MoG-1)
Andrew Gallagher
Clean images Corrupt images
3 Digit MNIST → 3 dimension Hedged Instance Embeddings (MoG-1)
Andrew Gallagher
2 Digit → 2 Dim
Hedged Instance Embeddings (MoG-1)
Andrew Gallagher
Task Performance
Andrew Gallagher
Task Performance
Andrew Gallagher
Most certain Least certain
Andrew Gallagher
Uncertainty Measure
Andrew Gallagher
PetNet with Uncertainty
20-dim embedding0.25 MobileNet
High Certainty Samples Low Certainty Samples
Andrew Gallagher
Uncertain Pet *
roger901
Certain Pet *
unbunt
* Not the actual photos, but similar.
Andrew Gallagher
Introduction
Embeddings
Uncertainty Representations (2)
Discussion
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Deep Convolutional Neural Network Features and the Original Image
Andrew Gallagher
Why is L2-Norm Useful?
Andrew Gallagher
Probabilistic Face Embeddings, Shi & Jain, 2019, https://arxiv.org/abs/1904.09658.
Fashion MNIST
MNIST
Datasets
Architecture50
D
10normalization
D=3, normalized, cross-entropy loss
D=4, normalized, triplet loss
Andrew Gallagher
Staying close to home ...
Andrew GallagherFa
ceN
et e
mbe
ddin
g
penu
ltim
ate
embe
ddin
g
feature extraction
norm
aliz
e
L2
sigmoid embedding confidence
Embedding
Andrew GallagherFa
ceN
et e
mbe
ddin
g
penu
ltim
ate
embe
ddin
g
Embedding
feature extraction
norm
aliz
e
L2
sigmoid embedding confidence
Distilled Embedding Confidence
distilled embedding confidence
Andrew Gallagher
Recognizable Not Recognizable0.123 0.0860.1640.2700.3200.3980.6180.630 0.4570.8650.935
Andrew Gallagher
Introduction
Embeddings
Uncertainty Representations (2)
Discussion
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Real-world applications
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Andrew Gallagher
A boring image.
Andrew Gallagher
A corrupted image.
Andrew Gallagher
ApplicationsGallery Selection: Given a potentially large collection of images or a video. Select a subset of images/frames that will most reliably identify the individual.
Computational Efficiency: Prevent running the expensive models on images with high uncertainty.
Tracking: Use as the confidence score.
Offline Face Clustering: Confidence weighted cluster similarity instead of top-N or other heuristic based approaches.
Andrew Gallagher
Open Questions
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Open Questions
How best to quantify and measure uncertainty?
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Open Questions
Tell us when you don’t know.
But only when absolutely necessary.
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Open Questions
How best to compute, index, and match uncertain embeddings (at scale)?
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Open Questions
What network architectures encourage learning uncertainty?
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Open Questions
Open world (epistemic) uncertainty.
Andrew Gallagher WNYIPW2019 Google
Andrew Gallagher
Thank you
Andrew Gallagher WNYIPW2019 Google