ibm research ai laso: label-set operations network for

1
LaSO: Label-Set Operations network for multi-label few-shot classification IBM Research AI Amit Alfassy* 1,3 , Leonid Karlinsky* 1 , Amit Aides* 1 ,Joseph Shtok 1 , Sivan Harary 1 , Rogerio Feris 1 , Raja Giryes 2 , Alex M. Bronstein 3 1 -IBM Research AI; 2 - School of Electrical Engineering, Tel-Aviv University; 3 - Department of Computer Science, Technion Essence: train generic (label agnostic) DL based operators that are able to manipulate the semantic content of the data points using reference examples. For example, in some feature space, given two images A and B, generate an image C whose labels correspond to intersection, union, or subtraction of labels of A and B (not necessarily assuming familiarity with these labels during training). Application: Multi label few-shot classification – given a few samples with mixtures of novel categories, learn a (multi-label) classifier for these categories. LaSO concept LaSO model Examples of images retrieved by generated vectors Classification accuracy COCO (mAP) Original (non-manipulated) feature vectors - upper bound Multi-label few shot classification of unseen categories (mAP) Original (non-manipulated) feature vectors - upper bound Image retrieval accuracy COCO (IoU) Analytic variants of the set operations (in feature space) Comparison between the learned operators with analytic alternatives. Classification accuracy (mAP) The goal of these networks is to synthesize new feature vectors from pairs of input vectors. The semantic content of each synthesized vector will correspond to the prescribed operation on the two source vectors’ label sets. Intersection Union Subtraction Subtraction of Intersection How similar are the fake vectors to the real vectors? Using InceptionV3 and the ResNet34 feature extractor backbones. The three LaSO networks, one for each set operation, are being trained jointly end-to-end. The classifier is trained using only the real images. Classification loss (BCE) – “fooling” the external classifier. Mode collapse loss(MSE) – increasing output diversity. Symmetry loss (BCE and MSE) – enforcing commutativity. Generative approach: synthesize new samples from the few available examples and add them to the training set. =

Upload: others

Post on 10-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM Research AI LaSO: Label-Set Operations network for

LaSO: Label-Set Operations networkfor multi-label few-shot classification

IBM Research AI

Amit Alfassy*1,3, Leonid Karlinsky*1, Amit Aides* 1,Joseph Shtok1, Sivan Harary1, Rogerio Feris1, Raja Giryes2, Alex M. Bronstein3

1 -IBM Research AI; 2 - School of Electrical Engineering, Tel-Aviv University; 3 - Department of Computer Science, Technion

Essence: train generic (label agnostic) DL based operators that are able to manipulate the semantic content of the data points using reference examples. For example, in some feature space, given two images A and B, generate an image C whose labels correspond to intersection, union, or subtraction of labels of A and B (not necessarily assuming familiarity with these labels during training).

Application: Multi label few-shot classification – given a few samples with mixtures of novel categories, learn a (multi-label) classifier for these categories.

LaSO concept LaSO model

Examples of images retrieved by generated vectors

Classification accuracy COCO (mAP)

Original (non-manipulated) feature vectors - upper bound

Multi-label few shot classification of unseen categories (mAP)

Original (non-manipulated) feature vectors - upper bound

Image retrieval accuracy COCO (IoU)

Analytic variants of the set operations (in feature space)

Comparison between the learned operators with analytic alternatives. Classification accuracy (mAP)

• The goal of these networks is to synthesize new feature vectors from pairs of input vectors.

• The semantic content of each synthesized vector will correspond to the prescribed operation on the two source vectors’ label sets.

Intersection Union

Subtraction Subtraction of Intersection

How similar are the fake vectors to the real vectors?

•Using InceptionV3 and the ResNet34 feature extractor backbones. • The three LaSO networks, one for each set operation, are being

trained jointly end-to-end.• The classifier is trained using only the real images.• Classification loss (BCE) – “fooling” the external classifier.• Mode collapse loss(MSE) – increasing output diversity.• Symmetry loss (BCE and MSE) – enforcing commutativity.

Generative approach: synthesize new

samples from the few available examples and add them to the training set.

=