knockoff nets: stealing functionality of black-box models...

9
Knockoff Nets: Stealing Functionality of Black-Box Models (Supplementary material) Tribhuvanesh Orekondy 1 Bernt Schiele 1 Mario Fritz 2 1 Max Planck Institute for Informatics 2 CISPA Helmholtz Center for Information Security Saarland Informatics Campus, Germany A. Contents The supplementary material contains: A. Contents (this section) B. Extended Descriptions 1. Blackbox Models 2. Overlap Between P V and P A 3. Aggregating OpenImages and OpenImages- Faces 4. Additional Implementation Details C. Extensions of Existing Results 1. Qualitative Results 2. Sample-efficiency of GT 3. Policies Learnt by adaptive Strategy 4. Reward Ablation D. Auxiliary experiments 1. Effect of CNN initialization 2. Seen and Unseen Classes 3. Adaptive Strategy: With/without hierarchy 4. Semi-open World: τD 2 B. Extended Descriptions In this section, we provide additional detailed descrip- tions and implementation details. B.1. Black-box Models We supplement Section 5.1 by providing extended de- scriptions of the blackboxes listed in Table 1 of the main paper. Each blackbox F V is trained on one particular image classification dataset. P V P A Caltech256 (K=256) CUBS200 (K=200) Indoor67 (K=67) Diabetic5 (K=5) ILSVRC (Z=1000) 108 (42%) 2 (1%) 10 (15%) 0 (0%) OpenImages (Z=601) 114 (44%) 1 (0.5%) 4 (6%) 0 (0%) Table S1: Overlap between PA and PV . Black-box 1: Caltech256 [5]. Caltech-256 is a popu- lar dataset for general object recognition gathered by down- loading relevant examples from Google Images and manu- ally screening for quality and errors. The dataset contains 30k images covering 256 common object categories. Black-box 2: CUBS200 [14]. A fine-grained bird-classifier is trained on the CUBS-200-2011 dataset. This dataset con- tains roughly 30 train and 30 test images for each of 200 species of birds. Due to the low intra-class variance, col- lecting and annotating images is challenging even for expert bird-watchers. Black-box 3: Indoor67 [11]. We introduce another fine- grained task of recognizing 67 types of indoor scenes. This dataset consists of 15.6k images collected from Google Im- ages, Flickr, and LabelMe. Black-box 4: Diabetic5 [1]. Diabetic Retinopathy (DR) is a medical eye condition characterized by retinal damage due to diabetes. Cases are typically determined by trained clinicians who look for presence of lesions and vascular ab- normalities in digital color photographs of the retina cap- tured using specialized cameras. Recently, a dataset of such 35k retinal image scans was made available as a part of a Kaggle competition [1]. Each image is annotated by a clini- cian on a scale of 0 (no DR) to 4 (proliferative DR). This highly-specialized biomedical dataset also presents chal- lenges in the form of extreme imbalance (largest class con- tains 30× as the smallest one). 1

Upload: others

Post on 07-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

Knockoff Nets: Stealing Functionality of Black-Box Models(Supplementary material)

Tribhuvanesh Orekondy1 Bernt Schiele1 Mario Fritz2

1 Max Planck Institute for Informatics 2 CISPA Helmholtz Center for Information SecuritySaarland Informatics Campus, Germany

A. ContentsThe supplementary material contains:

A. Contents (this section)

B. Extended Descriptions

1. Blackbox Models

2. Overlap Between PV and PA

3. Aggregating OpenImages and OpenImages-Faces

4. Additional Implementation Details

C. Extensions of Existing Results

1. Qualitative Results

2. Sample-efficiency of GT

3. Policies Learnt by adaptive Strategy

4. Reward Ablation

D. Auxiliary experiments

1. Effect of CNN initialization

2. Seen and Unseen Classes

3. Adaptive Strategy: With/without hierarchy

4. Semi-open World: τD2

B. Extended DescriptionsIn this section, we provide additional detailed descrip-

tions and implementation details.

B.1. Black-box Models

We supplement Section 5.1 by providing extended de-scriptions of the blackboxes listed in Table 1 of the mainpaper. Each blackbox FV is trained on one particular imageclassification dataset.

PV

PACaltech256

(K=256)CUBS200(K=200)

Indoor67(K=67)

Diabetic5(K=5)

ILSVRC (Z=1000) 108 (42%) 2 (1%) 10 (15%) 0 (0%)OpenImages (Z=601) 114 (44%) 1 (0.5%) 4 (6%) 0 (0%)

Table S1: Overlap between PA and PV .

Black-box 1: Caltech256 [5]. Caltech-256 is a popu-lar dataset for general object recognition gathered by down-loading relevant examples from Google Images and manu-ally screening for quality and errors. The dataset contains30k images covering 256 common object categories.

Black-box 2: CUBS200 [14]. A fine-grained bird-classifieris trained on the CUBS-200-2011 dataset. This dataset con-tains roughly 30 train and 30 test images for each of 200species of birds. Due to the low intra-class variance, col-lecting and annotating images is challenging even for expertbird-watchers.

Black-box 3: Indoor67 [11]. We introduce another fine-grained task of recognizing 67 types of indoor scenes. Thisdataset consists of 15.6k images collected from Google Im-ages, Flickr, and LabelMe.

Black-box 4: Diabetic5 [1]. Diabetic Retinopathy (DR)is a medical eye condition characterized by retinal damagedue to diabetes. Cases are typically determined by trainedclinicians who look for presence of lesions and vascular ab-normalities in digital color photographs of the retina cap-tured using specialized cameras. Recently, a dataset of such35k retinal image scans was made available as a part of aKaggle competition [1]. Each image is annotated by a clini-cian on a scale of 0 (no DR) to 4 (proliferative DR). Thishighly-specialized biomedical dataset also presents chal-lenges in the form of extreme imbalance (largest class con-tains 30× as the smallest one).

1

Page 2: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

0k 1k 10k 100k0

20

40

60

80

100

Accu

racy

Caltech256

PA = D2 ILSVRC OpenImg PV

π = adaptive random

0k 1k 10k 100k

CUBS200

0k 1k 10k 100kBudget B

0

20

40

60

80

100 Indoor67

0k 1k 10k 100kBudget B

Diabetic5

Figure S1: Performance of the knockoff at various budgets. (Enlarged version of Figure 5) Presented for various choices of adversary’simage distribution (PA) and sampling strategy π. represents accuracy of blackbox FV and represents chance-level performance.

0 10k 20kBudget B

0255075

100

Accu

racy

Caltech256

PV (KD)PV (FV )

2k 4kBudget B

CUBS200

0 5k 10kBudget B

Indoor67

0 10k 20k 30kBudget B

Diabetic5

Figure S2: Training on GT vs. KD. Extension of Figure 5. We compare sample efficiency of first two rows in Table 2: “PV (FV )”(training with GT data) and “PV (KD)” (training with soft-labels of GT images produced by FV )

B.2. Overlap: Open-world

In this section, we supplement Section 5.2.1 in the mainpaper by providing more details on how overlap was cal-culated in the open-world scenarios. We manually com-pute overlap between labels of the blackbox (K, e.g., 256Caltech classes) and the adversary’s dataset (Z, e.g., 1kILSVRC classes) as: 100 × |K ∩ Z|/|K|. We denote twolabels k ∈ K and z ∈ Z to overlap if: (a) they havethe same semantic meaning; or (b) z is a type of k e.g.,z = “maltese dog” and k = “dog”. The exact numbers are

provided in Table S1. We remark that this is a soft-lowerbound. For instance, while ILSVRC contains “Humming-bird” and CUBS-200-2011 contains three distinct speciesof hummingbirds, this is not counted towards the overlap asthe adversary lacks annotated data necessary to discriminateamong the three species.

B.3. Dataset Aggregation

All datasets used in the paper (expect OpenImages) havebeen used in the form made publicly available by the au-thors. We use a subset of OpenImages due to storage con-

Page 3: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

0k 1k 10kBudget B

0

20

40

60

80

100

Accu

racy

MNIST · FV =LeNet

EMNISTEMNISTLettersFashionMNISTKMNIST

0k 1k 10kBudget B

CIFAR10 · FV =Resnet-18CIFAR100TinyImageNet200CIFAR100 (pt)TinyImageNet200 (pt)

0k 1k 10kBudget B

CIFAR100 · FV =Resnet-18CIFAR10TinyImageNet200CIFAR10 (pt)TinyImageNet200 (pt)

Figure S3: Training with non-ImageNet initializations of knockoff models. Shown for various choices of blackboxes FV (subplots)and adversary’s image distribution PA (lines). All victim blackbox models are trained from scratch; test accuracy indicated by . Allknockoff models are either trained from scratch, or pretrained on the corresponding PA task (suffixed with ‘(pt)’).

straints imposed by its massive size (9M images). The de-scription to obtain these subsets are provided below.OpenImages. We retrieve 2k images for each of the600 OpenImages [8] “boxable” categories, resulting in 554kunique images. ∼19k images are removed for either beingcorrupt or representing Flickr’s placeholder for unavailableimages. This results in a total of 535k unique images.OpenImages-Faces. We download all images (422k) fromOpenImages [8] with label “/m/0dzct: Human face”using the OID tool [13]. The bounding box annotations areused to crop faces (plus a margin of 25%) containing at least180×180 pixels. We restrict to at most 5 faces per image tomaintain diversity between train/test splits. This results in atotal of 98k faces images.

B.4. Additional Implementation Details

In this section, we provide implementation details to sup-plement discussions in the main paper.Input Transformations. While training the blackboxmodels FV we augment training data by applying inputtransformations: random 224×224 crops and horizontalflips. This is followed by performing normalizing the im-age using standard Imagenet mean and standard deviationvalues. While training the knockoff model FA and for eval-uation, we resize the image to 256×256, obtain a 224×224center crop and normalize as before.Training FV = Diabetic5. We train this model using alearning rate of 0.01 (while this is 0.1 for the other models)and a weighted loss. Due to the extreme imbalance betweenclasses of the dataset, we weigh each class as follows. Letnk denote the number of images belonging to class k andlet nmin = mink nk. We weigh the loss for each class kas nmin/nk. From our experiments with weighted loss, wefound approximately 8% absolute improvement in overallaccuracy on the test set. However, the training of knock-offs of all blackboxes are identical in all aspects, includinga non-weighted loss irrespective of the victim blackbox tar-geted.

Creating ILSVRC Hierarchy. We represent the 1k la-bels of ILSVRC as a hierarchy (Figure 4b) in the form: rootnode “entity”→ N coarse nodes→ 1k leaf nodes. We ob-tainN (30 in our case) coarse labels as follows: (i) a 2048-dmean feature vector representation per 1k labels is obtainedusing an Imagenet-pretrained ResNet ; (ii) we cluster the 1kfeatures into N clusters using scikit-learn’s [10] implemen-tation of agglomerative clustering; (iii) we obtain semanticlabels per cluster (i.e., coarse node) by finding the commonparent in the Imagenet semantic hierarchy.

Adaptive Strategy. Recall from Section 6, we train theknockoff in two phases: (a) Online: during transfer set con-struction; followed by (b) Offline: the model is retrainedusing transfer set obtained thus far. In phase (a), we trainFA with SGD (with 0.5 momentum) with a learning rate of0.0005 and batch size of 4 (i.e., 4 images sampled at eacht). In phase (b), we train the knockoff FA from scratch onthe transfer set using SGD (with 0.5 momentum) for 100epochs with learning rate of 0.01 decayed by a factor of 0.1every 60 epochs. We used ∆=25.

C. Extensions of Existing Results

In this section, we present extensions of existing resultsdiscussed in the main paper.

C.1. Qualitative Results

Qualitative results to supplement Figure 6 are providedin Figures S4-S7. Each row in the figures correspond toan output class of the blackbox whose images the knockoffhas never encountered before. Images in the “transfer set”column were randomly sampled from ILSVRC [4, 12]. Incontrast, images in the “test set” belong to the victim’s testset (Caltech256, CUBS-200-2011, etc.).

C.2. Sample Efficiency: Training Knockoffs on GT

We extend Figure 5 in the main paper to include trainingon the same ground-truth data used to train the blackboxes.

Page 4: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

Transfer Set Test Set

Buddha: 0.65Boxing glove: 0.07Jesus Christ: 0.034

Buddha: 0.49Cake: 0.06

Minotaur: 0.05

Buddha: 0.28Minotaur: 0.09

Dog: 0.09

Buddha: 0.68Tombstone: 0.07Elephant: 0.04

Buddha: 0.5Minaret: 0.08

Tower Pisa: 0.05

People: 0.26Buddha: 0.25T-shirt: 0.14

Floppy-disk: 0.77Socks: 0.04

Mattress: 0.03

Floppy-disk: 0.46iPod: 0.18CD: 0.06

Floppy-disk: 0.18Flashlight: 0.08

Pez-dispenser: 0.07

Floppy-disk: 0.74Necktie: 0.04

Video Projec.: 0.02

Floppy-disk: 0.61Socks: 0.02

Teddy bear: 0.02

iPod: 0.49Floppy-disk: 0.08

CD: 0.04

Tomato: 0.96Grapes: 0.01

Boxing glove: 0.00

Tomato: 0.43Welding mask: 0.12Bowling ball: 0.01

Tomato: 0.27Dice: 0.18

Stained glass: 0.01

Tomato: 0.94Boxing glove: 0.04Bowling ball: 0.00

Tomato: 0.41Cactus: 0.07Iguana: 0.6

Mushroom: 0.17Tomato: 0.10Birdbath: 0.07

Figure S4: Qualitative results: Caltech256. Extends Figure 6 in the main paper. GT labels are underlined, correct knockoff top-1predictions in green and incorrect in red.

Transfer Set Test Set

Gadwall: 0.65Nighthawk: 0.06

Horned Lark: 0.05

Gadwall: 0.31B. Swallow: 0.14N. Flicker: 0.12

Gadwall: 0.14Chuck. Widow: 0.13Swain. Warbler: 0.1

Gadwall: 0.95Mallard: 0.01

Rb. Merganser: 0.00

Gadwall: 0.44Mallard: 0.15

Rb. Merganser: 0.11

Pom. Jaeger: 0.16Black Tern: 0.11

Herring Gull: 0.07

Lin. Sparrow: 0.82Ovenbird: 0.07

House Sparrow: 0.03

Lin. Sparrow: 0.49Mockingbird: 0.18

N. Waterthrush: 0.07

Lin. Sparrow: 0.32Song Sparrow: 0.06Tree Sparrow: 0.05

Lin. Sparrow: 0.64Hen. Sparrow: 0.05

Clay c. Sparrow: 0.03

Lin. Sparrow: 0.50Song Sparrow: 0.24Hen. Sparrow: 0.04

Hen. Sparrow: 0.28Ovenbird: 0.11

Lin. Sparrow: 0.10

Cactus Wren: 0.95W. Meadowlark: 0.02

Lin. Sparrow: 0.00

Cactus Wren: 0.88Geococcyx: 0.04

W. Meadowlark: 0.03

Cactus Wren: 0.33N. Flicker: 0.28

Lin. Sparrow: 0.07

Cactus Wren: 0.86Rock Wren: 0.02

R. Blackbird: 0.01

Cactus Wren: 0.82N. Flicker: 0.02Geococcyx: 0.02

Geococcyx: 0.25Cactus Wren: 0.20Nighthawk: 0.08

Figure S5: Qualitative results: CUBS200. Extends Figure 6 in the main paper.

Page 5: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

Transfer Set Test Set

Casino: 0.79Deli: 0.03

Jewel. Shop: 0.03

Casino: 0.38Airport ins.: 0.35

Bowling: 0.05

Casino: 0.18Bar: 0.18

Movie theater: 0.13

Casino: 0.99Deli: 0.00

Toystore: 0.00

Casino: 0.88Toystore: 0.08

Bar: 0.01

Restaurant: 0.46Bar: 0.24

Airport ins.: 0.07

Ins. Subway: 0.87Video store: 0.11

Dental office: 0.01

Ins. Subway: 0.59Florist: 0.12

Greenhouse: 0.06

Ins. Subway: 0.37Airport ins.: 0.12Train station: 0.08

Ins. Subway: 0.96Casino: 0.02

Museum: 0.01

Ins. Subway: 0.86Train station: 0.07

Subway: 0.05

Corridor: 0.45Ins. Subway: 0.11

Bar: 0.09

Prison cell: 0.52Elevator: 0.20

Airport ins.: 0.11

Prison cell: 0.23Museum: 0.19Nursery: 0.17

Prison cell: 0.21Museum: 0.12

Airport ins.: 0.11

Prison cell: 0.83Kitchen: 0.03

Locker room: 0.03

Prison cell: 0.52Subway: 0.08Nursery: 0.07

Wine cellar: 0.31Prison cell: 0.17Staircase: 0.09

Figure S6: Qualitative results: Indoor67. Extends Figure 6 in the main paper. GT labels are underlined, correct top-1 knockoffpredictions in green and incorrect in red.

Transfer Set Test Set

No DR: 0.73Proliferative: 0.12

Moderate: 0.08

No DR: 0.48Mild: 0.36

Moderate: 0.16

No DR: 0.30Moderate: 0.29

Proliferative: 0.28

No DR: 0.50Mild: 0.33

Moderate: 0.13

No DR: 0.36Mild: 0.33

Moderate: 0.28

Mild: 0.53No DR: 0.43

Moderate: 0.03

Moderate: 0.69No DR: 0.31Mild: 0.01

Moderate: 0.63No DR: 0.15Severe: 0.13

Moderate: 0.35Mild: 0.32

No DR: 0.22

Moderate: 0.48Mild: 0.316No DR: 0.21

Moderate: 0.35Mild: 0.31

No DR: 0.23

No DR: 0.36Mild: 0.33

Moderate: 0.26

Severe: 0.73Proliferative: 0.23

Moderate: 0.04

Severe: 0.70Proliferative: 0.30

Moderate: 0.00

Severe: 0.53Mild: 0.16

Moderate: 0.15

Severe: 0.57Moderate: 0.23

Proliferative: 0.19

Severe: 0.41Proliferative: 0.29

Moderate: 0.24

Moderate: 0.62Severe: 0.16Mild: 0.13

Figure S7: Qualitative results: Diabetic5. Extends Figure 6 in the main paper.

Page 6: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

Policy: D^2

hum

min

gbird

ham

moc

k

wate

rfall

car-t

ire

gras

shop

per

porc

upin

e

syrin

ge

skat

eboa

rd

picn

ic-ta

ble

fire-

hydr

ant

fire-

extin

guish

er

palm

-tree

tenn

is-co

urt

cowb

oy-h

at

goril

la

hour

glass

bask

etba

ll-ho

op

eyeg

lasse

s

cake

sadd

le

refri

gera

tor

dolp

hin-

101

swor

d

hom

er-si

mps

on

bons

ai-10

1

zebr

a

billia

rds

cock

roac

h

tra�

c-lig

ht

tread

mill

Actions z

0.0

0.01

0.02

0.03

fiz

Caltech256 · PA = D2 · t œ [0, 2500] · T = 7401

caltechcubsdiabetic

ilsvrcindooropenimg

bowl

ing

lock

erro

om

laund

rom

at

ware

hous

e

book

store

groc

erys

tore

dini

ngro

om bar

airpo

rtin

side

mee

ting

room

bake

ry

mall

mov

iethe

ater

tvstu

dio

kitch

en

wine

cella

r

audi

toriu

m

stairs

case

insid

esu

bway deli

subw

ay

train

statio

n

gym

pooli

nsid

e

waiti

ngro

om

casin

o

livin

groo

m

hairs

alon

bath

room

mus

eum

Actions z

0.0

0.01

0.02

0.03

fiz

Indoor67 · PA = D2 · t œ [0, 2500] · T = 7401

caltechcubsdiabetic

ilsvrcindooropenimg

Easte

rnTo

whee

LeCo

nte

Spar

row

Pied

bille

dGr

ebe

Flor

ida

Jay

Hood

edOr

iole

Whi

tebr

easte

dNu

that

ch

Long

taile

dJa

eger

Gadw

all

Seas

ide

Spar

row

Logg

erhe

adSh

rike

Gras

shop

perS

parro

w

War

blin

gVi

reo

Belte

dKi

ngfis

her

Pilea

ted

Woo

dpec

ker

Whi

tebr

easte

dKi

ngfis

her

Bobo

link

Ceda

rWax

wing

Hood

edM

erga

nser

Fish

Crow

Sciss

orta

iled

Flyc

atch

er

Rufo

usHu

mm

ingb

ird

Sum

mer

Tana

ger

Calif

orni

aGu

ll

Man

grov

eCu

ckoo

Bran

dtCo

rmor

ant

Herri

ngGu

ll

Pom

arin

eJa

eger

Amer

ican

Gold

finch

Wes

tern

Greb

e

Wor

mea

ting

War

bler

Actions z

0.0

0.025

fiz

CUBS200 · PA = D2 · t œ [0, 2500] · T = 7401

caltechcubs

diabeticilsvrc

indooropenimg

snail

fire-

hydr

ant

porc

upin

e

cano

e

dog

toas

ter

back

pack

cam

el

bath

tub

duck frog

llam

a-10

1

zebr

a

kang

aroo

-101

scor

pion

-101

wire

haire

dfo

xte

rrier

ham

ster

mot

orsc

oote

r

suit

castl

e

sloth

bear

skun

k

man

dolin

wom

bat

gira�

e

mus

hroo

m

sidew

inde

r

quail hen

flam

ingo

Actions z

0.0

0.02

fiz

Diabetic5 · PA = D2 · t œ [0, 2500] · T = 7401

caltechcubs

diabeticilsvrc

indooropenimg

(a) Closed world.

Policy: D^2 - evolution over time

Flor

ida

Jay

Logg

erhe

adSh

rike

Wes

tern

Greb

e

Brow

nTh

rash

er

Yello

wbr

easte

dCh

at

Hood

edOr

iole

Com

mon

Yello

wthr

oat

Whi

tebr

easte

dNu

that

ch

Long

taile

dJa

eger

Cape

May

War

bler

Prair

ieW

arbl

er

Bank

Swall

ow

Ring

edKi

ngfis

her

Leas

tFlyc

atch

er

Com

mon

Tern

Hens

lowSp

arro

w

Nelso

nSh

arp

taile

dSp

arro

w

Pine

Gros

beak

Chuc

kwi

llW

idow

Brew

erSp

arro

w

Blue

Jay

Hous

eW

ren

Bay

brea

sted

War

bler

Gray

King

bird

Whi

teey

edVi

reo

Sava

nnah

Spar

row

Wes

tern

Gull

Red

brea

sted

Mer

gans

er

Glau

cous

wing

edGu

ll

Bewi

ckW

ren

Actions z

0.0

0.01

fiz

CUBS200 · PA = D2 · t œ [0, 1000] · T = 7401

caltechcubs

diabeticilsvrc

indooropenimg

Blue

Gros

beak

Fish

Crow

Tree

Swall

ow

Blac

kbi

lled

Cuck

oo

Acad

ianFl

ycat

cher

Pain

ted

Bunt

ing

Sage

Thra

sher

Kent

ucky

War

bler

Win

terW

ren

Cres

ted

Aukle

t

Heer

man

nGu

ll

Whi

tene

cked

Rave

n

Amer

ican

Reds

tart

Ruby

thro

ated

Hum

min

gbird

Mag

nolia

War

bler

Blue

wing

edW

arbl

er

Soot

yAl

batro

ss

Indi

goBu

ntin

g

Mou

rnin

gW

arbl

er

Hood

edW

arbl

er

Whi

tePe

lican

Pelag

icCo

rmor

ant

Whi

ppo

orW

ill

Rock

Wre

n

Oven

bird

North

ern

Flick

er

Gras

shop

perS

parro

w

Eleg

antT

ern

Whi

teth

roat

edSp

arro

w

Easte

rnTo

whee

Actions z

0.0

0.01

fiz

CUBS200 · PA = D2 · t œ [1000, 2000] · T = 7401

caltechcubs

diabeticilsvrc

indooropenimg

ostri

ch

tenn

is-sh

oes

jagua

r

cran

e

jacke

t

bicy

clewh

eel

mail

box

duck

car-t

ire

bino

cular

s

peng

uin

snak

e

porc

upin

e

gond

ola

spor

tsun

iform

teap

ot

tenn

isra

cket

chee

se

tank

scre

wdriv

er

foot

ball

helm

et

gold

fish

skys

crap

er

skat

eboa

rd

carb

onar

a

violin

willo

w

drag

onfly

goril

la

cras

hhe

lmet

Actions z

0.0

0.005

0.01

fiz

CUBS200 · PA = D2 · t œ [3000, 4000] · T = 7401

caltechcubs

diabeticilsvrc

indooropenimg

snail

shee

t-mus

ic

ipod

goat

back

pack

close

t

otte

r

rota

ry-p

hone

budd

ha-1

01

spag

hetti

squa

sh

mixi

ngbo

wl

jagua

r

cano

e

tom

bsto

ne

pape

rclip

pant

ry

wine

video

store

lemon

labor

ator

ywet

knife

gold

fish

fryin

g-pa

n

eleph

ant

coin

bide

t

kite

swan

jackf

ruit

rottw

eiler

Actions z

0.0

0.005

0.01

fiz

CUBS200 · PA = D2 · t œ [2000, 3000] · T = 7401

caltechcubs

diabeticilsvrc

indooropenimg

(b) Closed world. Analyzing policy over time t for CUBS200.

Policy: ILSVRC

libra

rypo

olta

ble crib

book

case

altar

ward

robe

thea

terc

urta

inbo

oksh

opor

gan

wash

erto

ysho

pdin

ingta

blegu

acam

olech

i�on

ierpiz

zash

ojim

edici

nech

est

desk

top

com

pute

rho

tpot

resta

uran

tde

skm

icrow

ave

groc

ery

store

refri

gera

tor

filech

inaca

binet

toba

cco

shop

carb

onar

ash

oesh

oppla

te

Actions z

0.0

0.01

fiz

Indoor67 · PA = ILSVRC

artifact 1 artifact 5 instrumentality physical entity 1

carb

onar

am

anho

leco

ver

elect

ricfa

nty

pewr

iterk

eybo

ard

amer

ican

lobste

rse

acu

cum

ber

dom

eco

mpu

terk

eybo

ard

pizza

shiel

dwo

kwa

lking

stick coil

runn

ingsh

oejac

kolan

tern

chee

tah

diam

ondb

ack

hotp

otim

pala

elect

ricloc

omot

iveco

nfec

tione

ry ox ram

mea

tloa

fco

ralf

ungu

sab

acus

toys

hop

wire

haire

dfox

terri

erwa

rthog

bison

Actions z

0.0

0.01

fiz

Diabetic5 · PA = ILSVRC

animal 2artifact 4carnivore

dogeven-toed ungulate

physical entityphysical entity 1

physical entity 3self-propelled vehicle

radio

teles

cope

airlin

erwa

rplan

ebe

acon

airsh

ipho

urgla

ssze

bra

scho

oner

chee

sebu

rger

mon

arch

gard

ensp

ider

yawl

starfi

shca

ssette

playe

rsn

owm

obile

black

and

gold

phot

ocop

ierha

ndhe

ldco

mpu

ter

goldfi

shto

aste

rob

elisk

ballo

onjoy

stick fly

pitch

erha

mm

erhe

adkil

lerwh

alem

icrow

ave

man

tis pick

Actions z

0.0

0.01

fiz

Caltech256 · PA = ILSVRC

animalanimal 1animal 2

artifactartifact 4artifact 5

even-toed ungulateinstrumentalityinstrumentality 1

instrumentality 2physical entityphysical entity 1

goldfi

nch

hous

efinc

hjun

coind

igobu

nting

bram

bling

jacam

arch

ickad

eehu

mm

ingbir

dvin

esna

kejag

uar

amer

ican

cham

eleon

bulbu

lcr

icket

redb

reas

ted

mer

gans

ergr

een

lizar

dbe

eeat

ergr

een

snak

egr

een

mam

baleo

pard

dugo

ngtig

er jaydr

ake

robin

redb

acke

dsa

ndpip

ertre

efro

galb

atro

ssm

antis

dam

selfly be

e

Actions z

0.0

0.01

fiz

CUBS200 · PA = ILSVRC

animal bird bird 1 carnivore

(c) Open world.

Figure S8: Policies learnt by adaptive strategy. Supplements Figure 7 in the main paper.

This extension “PV (FV )” is illustrated in Figure S2, dis-played alongside KD approach. The figure represents thesample-efficiency of the first two rows of Table 2. Herewe observe: (i) comparable performance in all but one case(Diabetic5, discussed shortly) indicating KD is an effec-tive approach to train knockoffs; (ii) we find KD achieve

better performance in Caltech256 and Diabetic5 due toregularizing effect of training on soft-labels [6] on an im-balanced dataset.

Page 7: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

0

25

50

75

100

Accu

racy

Indoor67 ·PA = D2

rewardcertcert + div

cert + div + Lnoneuncert

Indoor67 ·PA = ILSVRC

0 5k 10k 15k 20kBudget B

0

25

50

75

100

Accu

racy

Diabetic5 ·PA = D2

0 5k 10k 15k 20kBudget B

Diabetic5 ·PA = ILSVRC

Figure S9: Reward Ablation. Supplements Figure 8 in the mainpaper.

0 20k 40k 60kBudget B

0

25

50

75

100

Accu

racy

Caltech256

classesallseenunseen

0 20k 40k 60kBudget B

Indoor67

Figure S10: Per class evaluation. Per-class evaluation split intoseen and unseen classes.

C.3. Policies learnt by Adaptive

We inspected the policy π learnt by the adaptive strat-egy in Section 6.1. In this section, we provide policies overall blackboxes in the closed- and open-world setting. Fig-ures S8a and S8c display probabilities of each action z ∈ Zat t = 2500.

Since the distribution of rewards is non-stationary, wevisualize the policy over time in Figure S8b for CUBS200in a closed-world setup. From this figure, we observe anevolution where: (i) at early stages (t ∈ [0, 2000]), theapproach samples (without replacement) images that over-laps with the victim’s train data; and (ii) at later stages(t ∈ [2000, 4000]), since the overlapping images have beenexhausted, the approach explores related images from otherdatasets e.g., “ostrich”, “jaguar”.

C.4. Reward Ablation

The reward ablation experiment (Figure 8 in the mainpaper) for the remaining datasets are provided in FigureS9. We make similar observations as before for Indoor67.However, since FV = Diabetic5 demonstrates confidentpredictions in all images, we find little-to-no improvementfor knockoffs of this victim model.

D. Auxiliary Experiments

In this section, we present experiments to supplementexisting results in the main paper.

D.1. Effect of CNN Initialization

In our experiments (Section 6), the victim and knockoffmodels are initialized with ImageNet pretrained weights1, ade facto when training CNNs with a limited amount of data.In this section, we study influence of different initializationsof the victim and adversary models.

To achieve reasonable performance in our limited datasetting, we perform the following experiments on compar-atively smaller models and datasets. We choose three vic-tim blackboxes (all trained after random initialization) usingthe following datasets: MNIST [9], CIFAR10 [7], and CI-FAR100 [7]. We train a LeNet-like model2 for MNIST, andResnet-18 models for CIFAR-10 and CIFAR-100.

While we use the same blackbox model architecture forthe knockoff, we either randomly initialize them or pre-train them on a different task. Consequently, in the fol-lowing experiments, both the victim and knockoff havedifferent initializations. We repeat our experiment us-ing random policy (Section 4.1.1) and using as the queryset PA: (a) when PV =MNIST: EMNIST [3] (supersetof MNIST containing alpha numeric characters [A-Z, a-z, 0-9]), EMNISTLetters ([A-Z, a-z]), FashionMNIST [15](fashion items spanning 10 classes e.g., trouser, coat) andKMNIST [2] (Japanese Hiragana characters spanning 10classes); (b) when PV =CIFAR10: CIFAR100 [7] and Tiny-ImageNet2003 (subset of ImageNet with 500 images pereach of 200 classes); and (c)when PV =CIFAR100: CI-FAR10 and TinyImageNet200. Note that the output classesbetween CIFAR10 and CIFAR100 are disjoint.

From Figure S3, we observe: (i) model stealing is pos-sible even when the knockoffs are randomly initialized.For instance, when stealing MNIST, we recover 0.98×victim accuracy across all choices of PA; (ii) pretrain-ing the knockoff model – even on a different task – im-proves sample efficiency of model stealing attacks e.g.,when FV =CIFAR10-resnet18, querying images from PV

improves the knockoff accuracy after 50k queries from46.5% to 78.9%.

D.2. Seen and Unseen classes

We now discuss evaluation to supplement Section 5.2.1and Section 6.1.

In Section 6.1, we highlighted strong performance of theknockoff even among classes that were never encountered(see Table S1 for exact numbers) during training. To elab-orate, we split the blackbox output classes into “seen” and“unseen” categories and present mean per-class accuraciesin Figure S10. Although we find better performance on

1Alternatives for ImageNet pretrained models across a wide range ofarchitectures were not available at the time of writing

2https://github.com/pytorch/examples/blob/master/mnist/main.py

3https://tiny-imagenet.herokuapp.com/

Page 8: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

0 10k 20k 30kBudget B

0

25

50

75

100

Accu

racy

Caltech256

StrategyPV (KD)adaptive

adaptive-flatrandom

0 5k 10k 15k 20kBudget B

CUBS200

0 5k 10k 15k 20kBudget B

Indoor67

0 10k 20k 30k 40kBudget B

Diabetic5

Figure S11: Hierarchy. Evaluating adaptive with and without hierarchy using PA = D2. represents accuracy of blackbox FV andrepresents chance-level performance.

0 10k 20kBudget B

0

25

50

75

100

Accu

racy

Caltech256

0 10k 20kBudget B

CUBS200

0 10k 20kBudget B

Indoor67

τd = 0.01 0.1 0.5 1.0 π = adaptive random

0 10k 20kBudget B

Diabetic5

0 10k 20kBudget B

0

25

50

75

100

Accu

racy

Caltech256

0 10k 20kBudget B

CUBS200

0 10k 20kBudget B

Indoor67

τk = 0.01 0.1 0.5 1.0 π = adaptive random

0 10k 20kBudget B

Diabetic5

Figure S12: Semi-open world: τd and τk.

classes seen while training the knockoff, performance of un-seen classes is remarkably high, with the knockoff achiev-ing >70% performance in both cases.

D.3. Adaptive: With and without hierarchy

The adaptive strategy presented in Section 4.1.2 usesa hierarchy discussed in Section 5.2.2. As a result, we ap-proached this as a hierarchical multi-armed bandit problem.Now, we present an alternate approach adaptive-flat,without the hierarchy. This is simply a multi-armed banditproblem with |Z| arms (actions).

Figure S11 illustrates the performance of these ap-proaches using PA = D2 (|Z| = 2129) and rewards{certainty, diversity, loss}. We observe adaptive con-sistently outperforms adaptive-flat. For instance, inCUBS200, adaptive is 2× more sample-efficient to reachaccuracy of 50%. We find the hierarchy helps the adversary(agent) better navigate the large action space.

D.4. Semi-open World

The closed-world experiments (PA = D2) presented inSection 6.1 and discussed in Section 5.2.1 assumed accessto the image universe. Thereby, the overlap between PA andPV was 100%. Now, we present an intermediate overlapscenario semi-open world by parameterizing the overlapas: (i) τd: The overlap between images PA and PV is 100×τd; and (ii) τk: The overlap between labelsK andZ is 100×τk. In both these cases τd, τk ∈ (0, 1] represents the fractionof PA used. τd = τk = 1 depicts the closed-world scenariodiscussed in Section 6.1.

From Figure S12, we observe: (i) the random strategyis unaffected in the semi-open world scenario, displayingcomparable performance for all values of τd and τk; (ii) τd:knockoff obtained using adaptive obtains strong perfor-mance even with low overlap e.g., a difference of at most3% performance in Caltech256 even at τd = 0.1; (iii)τk: although the adaptive strategy is minimally affectedin few cases (e.g., CUBS200), we find the performance drop

Page 9: Knockoff Nets: Stealing Functionality of Black-Box Models ...openaccess.thecvf.com/content_CVPR_2019/supplemental/Orekondy… · Each image is annotated by a clini-cian on a scale

due to a pure exploitation (certainty) that is used. We ob-served recovery in performance by using all rewards indi-cating exploration goals (diversity, loss) are necessary whentransitioning to an open-world scenario.

References[1] Eyepacs. https://www.kaggle.com/c/

diabetic-retinopathy-detection. Accessed: 2018-11-08. 1

[2] Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto,Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learn-ing for classical japanese literature, 2018. 7

[3] Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andrevan Schaik. Emnist: an extension of mnist to handwrittenletters. arXiv preprint arXiv:1702.05373, 2017. 7

[4] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,and Li Fei-Fei. Imagenet: A large-scale hierarchical imagedatabase. In CVPR, 2009. 3

[5] Gregory Griffin, Alex Holub, and Pietro Perona. Caltech-256object category dataset. 2007. 1

[6] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling theknowledge in a neural network. arXiv:1503.02531, 2015. 6

[7] Alex Krizhevsky and Geoffrey Hinton. Learning multiplelayers of features from tiny images. Technical report, Cite-seer, 2009. 7

[8] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Ui-jlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, StefanPopov, Matteo Malloci, Tom Duerig, et al. The open imagesdataset v4: Unified image classification, object detection,and visual relationship detection at scale. arXiv:1811.00982,2018. 3

[9] Yann LeCun, Corinna Cortes, and CJ Burges. Mnist hand-written digit database. AT&T Labs [Online]. Available:http://yann. lecun. com/exdb/mnist, 2:18, 2010. 7

[10] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M.Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machinelearning in Python. Journal of Machine Learning Research,12:2825–2830, 2011. 3

[11] Ariadna Quattoni and Antonio Torralba. Recognizing indoorscenes. In CVPR, 2009. 1

[12] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San-jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael Bernstein, et al. Imagenet largescale visual recognition challenge. IJCV, 2015. 3

[13] Angelo Vittorio. Toolkit to download and visualize singleor multiple classes from the huge open images v4 dataset.https://github.com/EscVM/OIDv4_ToolKit, 2018. 3

[14] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie.The Caltech-UCSD Birds-200-2011 Dataset. Technical Re-port CNS-TR-2011-001, California Institute of Technology,2011. 1

[15] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machinelearning algorithms, 2017. 7