a gan fingerprints: rooting deep fake attribution in ... · artificial gan fingerprints: rooting...

15
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data Ning Yu 1,2 * Vladislav Skripniuk 3* Sahar Abdelnabi 3 Mario Fritz 3 1 University of Maryland 2 Max Planck Institute for Informatics 3 CISPA Helmholtz Center for Information Security {ningyu,vladislav}@mpi-inf.mpg.de {sahar.abdelnabi,fritz}@cispa.saarland Abstract Photorealistic image generation has reached a new level of quality due to the breakthroughs of generative adversar- ial networks (GANs). Yet, the dark side of such deepfakes, the malicious use of generated media, raises concerns about visual misinformation. While existing research work on deepfake detection demonstrates high accuracy, it is subject to advances in generation techniques and adversarial iter- ations on detection countermeasure techniques. Thus, we seek a proactive and sustainable solution on deepfake de- tection, that is agnostic to the evolution of generative mod- els, by introducing artificial fingerprints into the models. Our approach is simple and effective. We first embed artificial fingerprints into training data, then validate a sur- prising discovery on the transferability of such fingerprints from training data to generative models, which in turn ap- pears in the generated deepfakes. Experiments show that our fingerprinting solution (1) holds for a variety of cutting- edge generative models, (2) leads to a negligible side effect on generation quality, (3) stays robust against image-level and model-level perturbations, (4) stays hard to be detected by adversaries, and (5) converts deepfake detection and at- tribution into trivial tasks and outperforms the recent state- of-the-art baselines. Our solution closes the responsibility loop between publishing pre-trained generative model in- ventions and their possible misuses, which makes it inde- pendent of the current arms race. 1. Introduction In the past years, photorealistic image generation has been rapidly evolving, benefiting from the invention of gen- erative adversarial networks (GANs) [17] and its successive breakthroughs [39, 18, 35, 5, 25, 26, 27]. Given the level of realism and diversity that generative models can achieve today, detecting generated media, well known as deepfakes, * Equal contribution. attributing their sources, and tracing their legal responsibil- ities become infeasible to human beings. Moreover, the misuse of deepfakes has been permeating to each corner of social media, ranging from misinforma- tion of political campaigns [24] to fake journalism [47, 41]. This motivates tremendous research efforts on deepfake de- tection [54] and source attribution [34, 52, 49]. These tech- niques aim to counter the widespread of malicious applica- tions of deepfakes by automatically identifying and flagging generated visual contents and tracking their sources. Most of them rely on low-level visual patterns in GAN-generated images [34, 52, 49] or frequency mismatch [14, 57, 15]. However, these techniques are unable to sustainably and robustly prevent deepfake misuse in the long run; as gen- erative models evolve, they learn to better match the true distribution causing fewer artifacts [54]. Besides, detection countermeasures are also continuously evolving [13, 7, 54]. Motivated by this, we tackle deepfake detection and at- tribution through a different lens, and propose a proactive and sustainable solution for detection, which is simple and effective. In specific, we aim to introduce artificial finger- prints into generative models that enable identification and tracing. Figure 1 depicts our pipeline; we first embed artifi- cial fingerprints into the training data using image steganog- raphy [4, 44]. The generative model is then trained with its original protocol without modification. This makes our so- lution agnostic and plug-and-play for arbitrary models. We then show a surprising discovery on the transferability of such fingerprints from training data to the model: the same fingerprint information that was encoded in the training data can be decoded from all generated images. We achieve deepfake detection by classifying images with matched fingerprints in our database as fake and im- ages with random detected fingerprints as real. We also achieve deepfake attribution when we allocate different fin- gerprints for different generative models. Our solution thus closes the responsibility loop between generative model in- ventions and their possible misuses. It prevents the misuse of published pre-trained generative models by enabling in- arXiv:2007.08457v5 [cs.CR] 31 Mar 2021

Upload: others

Post on 16-Nov-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

Artificial Fingerprinting for Generative Models:Rooting Deepfake Attribution in Training Data

Ning Yu1,2* Vladislav Skripniuk3∗ Sahar Abdelnabi3 Mario Fritz3

1University of Maryland 2Max Planck Institute for Informatics3CISPA Helmholtz Center for Information Security

{ningyu,vladislav}@mpi-inf.mpg.de {sahar.abdelnabi,fritz}@cispa.saarland

Abstract

Photorealistic image generation has reached a new levelof quality due to the breakthroughs of generative adversar-ial networks (GANs). Yet, the dark side of such deepfakes,the malicious use of generated media, raises concerns aboutvisual misinformation. While existing research work ondeepfake detection demonstrates high accuracy, it is subjectto advances in generation techniques and adversarial iter-ations on detection countermeasure techniques. Thus, weseek a proactive and sustainable solution on deepfake de-tection, that is agnostic to the evolution of generative mod-els, by introducing artificial fingerprints into the models.

Our approach is simple and effective. We first embedartificial fingerprints into training data, then validate a sur-prising discovery on the transferability of such fingerprintsfrom training data to generative models, which in turn ap-pears in the generated deepfakes. Experiments show thatour fingerprinting solution (1) holds for a variety of cutting-edge generative models, (2) leads to a negligible side effecton generation quality, (3) stays robust against image-leveland model-level perturbations, (4) stays hard to be detectedby adversaries, and (5) converts deepfake detection and at-tribution into trivial tasks and outperforms the recent state-of-the-art baselines. Our solution closes the responsibilityloop between publishing pre-trained generative model in-ventions and their possible misuses, which makes it inde-pendent of the current arms race.

1. IntroductionIn the past years, photorealistic image generation has

been rapidly evolving, benefiting from the invention of gen-erative adversarial networks (GANs) [17] and its successivebreakthroughs [39, 18, 35, 5, 25, 26, 27]. Given the levelof realism and diversity that generative models can achievetoday, detecting generated media, well known as deepfakes,

*Equal contribution.

attributing their sources, and tracing their legal responsibil-ities become infeasible to human beings.

Moreover, the misuse of deepfakes has been permeatingto each corner of social media, ranging from misinforma-tion of political campaigns [24] to fake journalism [47, 41].This motivates tremendous research efforts on deepfake de-tection [54] and source attribution [34, 52, 49]. These tech-niques aim to counter the widespread of malicious applica-tions of deepfakes by automatically identifying and flagginggenerated visual contents and tracking their sources. Mostof them rely on low-level visual patterns in GAN-generatedimages [34, 52, 49] or frequency mismatch [14, 57, 15].However, these techniques are unable to sustainably androbustly prevent deepfake misuse in the long run; as gen-erative models evolve, they learn to better match the truedistribution causing fewer artifacts [54]. Besides, detectioncountermeasures are also continuously evolving [13, 7, 54].

Motivated by this, we tackle deepfake detection and at-tribution through a different lens, and propose a proactiveand sustainable solution for detection, which is simple andeffective. In specific, we aim to introduce artificial finger-prints into generative models that enable identification andtracing. Figure 1 depicts our pipeline; we first embed artifi-cial fingerprints into the training data using image steganog-raphy [4, 44]. The generative model is then trained with itsoriginal protocol without modification. This makes our so-lution agnostic and plug-and-play for arbitrary models. Wethen show a surprising discovery on the transferability ofsuch fingerprints from training data to the model: the samefingerprint information that was encoded in the training datacan be decoded from all generated images.

We achieve deepfake detection by classifying imageswith matched fingerprints in our database as fake and im-ages with random detected fingerprints as real. We alsoachieve deepfake attribution when we allocate different fin-gerprints for different generative models. Our solution thuscloses the responsibility loop between generative model in-ventions and their possible misuses. It prevents the misuseof published pre-trained generative models by enabling in-

arX

iv:2

007.

0845

7v5

[cs

.CR

] 3

1 M

ar 2

021

Page 2: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

Figure 1: Our solution pipeline consists of four stages. We first train an image steganography encoder and decoder. Then weuse the encoder to embed artificial fingerprints into the training data. After that, we train a generative model with its originalprotocol. Finally, we decode the fingerprints from the generated deepfakes.

ventors to proactively and responsibly embed artificial fin-gerprints into the models.

We summarize our contributions as follow:(1) We synergize the two previously uncorrelated do-

mains, image steganography and GANs, and propose thefirst proactive and sustainable solution for the third emerg-ing domain, deepfake detection and attribution.

(2) This is the first study to demonstrate the transferabil-ity of artificial fingerprints from training data to generativemodels and then to all the generated deepfakes. Our dis-covery is non-trivial: only deep-learning-based fingerprint-ing techniques [4, 44] are transferable to generative models,while conventional steganography and watermarking tech-niques [2, 1] are not. See Section 5.2 for comparisons.

(3) We empirically validate several beneficial propertiesof our solution. Universality (Section 5.2): it holds for a va-riety of cutting-edge generative models [25, 26, 27, 5, 36].Fidelity (Section 5.3): it has a negligible side effect on gen-eration quality. Robustness (Section 5.4): it stays robustagainst many perturbations. Secrecy (Section 5.5): the ar-tificial fingerprints are hard to be detected by adversaries.Anti-deepfake (Section 5.6 and 5.7): it converts deepfakedetection and attribution into trivial tasks and outperformsthe state-of-the-art baselines [52, 49].

2. Related WorkGenerative adversarial networks (GANs). GANs [17]

was first proposed as a workaround to model the intractablereal data distribution. The iterative improvements push thegeneration realism to brand-new levels [39, 18, 35, 5, 25,26, 27]. Successes have also spread to many other visiontasks (e.g. [37, 29, 23, 59, 60, 36, 51]). In Section 5, we fo-cus on three categories of cutting-edge generative models:unconditional (ProGAN [25], StyleGAN [26], and Style-

GAN2 [27]), class-conditional (BigGAN [5]), and image-conditional (image-to-image translation) (CUT [36]).

Image steganography and watermarking. Imagesteganography and watermarking hide information into car-rier images [16]. Previous techniques rely on Fourier trans-form [12, 8], JPEG compression [2, 1], or least signifi-cant bits modification [38, 21, 22]. Recent works replacehand-crafted hiding procedures with neural network encod-ing [4, 19, 48, 58, 56, 44, 33]. We leverage recent deep-learning-based steganography methods [4, 44] to embedartificial fingerprints into training data, and validate theirtransferability to generative models. This is non-trivialbecause only deep-learning-based fingerprints are transfer-able to generative models, while conventional ones [2, 1]are not (Section 5.2). Besides, the stealthiness achievedby steganography allows preserving the original generationquality (Section 5.3) and fingerprint secrecy (Section 5.5).

Our fingerprinting is conceptually and functionally or-thogonal to all of them. Instead of encoding informationinto pixels of individual images, our solution encodes in-formation into generator parameters such that all the gener-ated images are entangled with that information. Comparedto the pipeline of a generator followed by a watermarkingmodule, our solution introduces zero generation overheads,and obstructs adversarial model surgery that targets to de-tach watermarking from image generation.

Network watermarking. Different from image water-marking, network watermarking targets to hide informationinto model parameters without affecting its original perfor-mances, similar in spirit to our goal. There are two cat-egories of them: black-box trigger-set-based solutions [3,55], and white-box feature-based solutions [46, 10, 43]. Theformer ones embed watermarks through a trigger set of in-put and decodes watermarks according to the input-output

Page 3: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

behavior of the model. The latter ones directly embed wa-termarks in the model parameter space with transformationmatrices. It is worth noting that our solution renders con-ceptual and technical distinctions from network watermark-ing. In terms of concepts, the previous works target to onlydiscriminative models (e.g., classification), while a solutionfor generative models is urgently lacking. In terms of tech-niques, to adapt to generator watermarking, we tune our so-lution to indirectly transfers fingerprints from training datato model parameters. This is because (1) unconditional gen-erative models do not allow deterministic input so that atrigger set is not applicable, and (2) transformations in theparameter space are not agnostic to model configurations sothat they are neither scalable nor sustainable along with theevolution of generative models.

Deepfake detection and attribution. Images generatedby GAN models bear unique patterns. [34] shows that gen-erative models leave unique noise residuals to generatedsamples, which allows deepfake detection. [52] moves onestep further, using a neural network classifier to attributedifferent images to their sources. [49] also train a classifierand improve the generalization across different generationtechniques. [57, 14, 13] point out that the high-frequencypattern mismatch can be used for deepfake detection, so canthe texture feature mismatch [32]. However, these cues arenot sustainable due to the advancement of detection coun-termeasures. For example, spectral regularization [13] isproposed to narrow down the frequency mismatch and re-sults in a significant detection deterioration. Also, detec-tors [49] are vulnerable to adversarial evasion attacks [7].

In contrast to the previous passive approaches, we pro-pose a novel proactive solution for model fingerprintingand, thus, for deepfake detection. We differentiate be-tween our term artificial fingerprints which refers to the in-formation we deliberately and proactively embed into themodel, and the term GAN fingerprints [52] which refersto the inherent cues and artifacts of different GAN mod-els. Our work is also distinct from a follow-up proactivetechnique [53]. They focus on fingerprinting scalability andefficiency while we focus more fundamentally on its trans-ferability and universality.

3. Problem StatementGeneration techniques can be misused to create misin-

formation at scale to achieve financial or political gains. Re-cently, there have been concerns about releasing generativemodels. For example, OpenAI employed a staged releaseto evaluate the potential risks of their GPT-2 model [40].GPT-3 was later released as a black-box API only [6].Face2Face [45] authors did not open their sources for real-time face capture and reenactment.

We design solution from the model inventors’ side (e.g.,OpenAI). Our solution introduces traceable artificial finger-

prints in generative models. It enables deepfake detectionand attribution by decoding the fingerprints from the gen-erated images and matching them to the known fingerprintsgiven to different models. This equips model inventors witha means for a proactive and responsible disclosure whenpublishing their pre-trained models. This distinguishes ourmodel fingerprinting solution from watermarking the gen-erated images: we aim to defend against the misuse of pub-lished generative models rather than single deepfake media.

In practice, the training is done by the model inven-tor. Responsible model inventors, different from maliciousdeepfake users, should be eager/willing to adopt a proac-tive solution to fingerprint their generative models againstpotential deepfake misuses. The fingerprinting encoder anddecoder, and the unique fingerprints given to different mod-els, are privately maintained by the model inventor. Oncea deepfake misuse happens, the inventor is able to verify ifthis is generated by one of their models. If so, they can fur-ther attribute by which model user. Then they can prohibitthat user’s accessibility to the model and/or seek legal regu-lations. Thus, they can claim responsible disclosure with acountermeasure against potential misuse when they publishtheir models.

4. Artificial FingerprintsThe goal of image attribution is to learn a map-

ping D0(x) 7→ y that traces the source y ∈ Y ={real,G1, . . . ,GN} of an image x. If the domain Y is lim-ited, predefined, and known to us, this is a closed-worldscenario and the attribution can be simply formulated as amulti-label classification problem, each label correspond-ing to one source, as conducted in [52]. However, Y can beunlimited, undefined, continuously evolving, and agnosticto us. This open-world scenario is intractable using dis-criminative learning. To generalize our solution to beingagnostic to the selection of generative models, we formu-late the attribution as a regression mapping D(x) 7→ w,where w ∈ {0, 1}n is the source identity space and n isthe dimension. We propose a pipeline to root the attributiondown to the training dataset x ∈ X and close the loop of theregression D. We describe the pipeline stages (depicted inFigure 1) below:

Steganography training. The source identity is repre-sented by the artificial fingerprints w. We use a steganogra-phy system [4, 44] to learn an encoder E(x,w) 7→ xw thatembeds an arbitrary fingerprint w (randomly sampled dur-ing training) into an arbitrary image x. We couple E with adecoder D(xw) 7→ w to detect the fingerprint informationfrom the image. E and D are formulated as convolutionalneural networks with the following training losses:

minE,D

Ex∼X,w∼{0,1}nLBCE(x,w;E,D) + λLMSE(x,w;E)

(1)

Page 4: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

LBCE(x,w;E,D) =1

n

n∑k=1

(wk log wk+(1−wk) log(1−wk)

)(2)

LMSE(x,w;E) = ||E(x,w)− x||22 (3)

w = D(E(x,w)

)(4)

where wk and wk are the kth bit of the input fingerprint anddetected fingerprint separately; and λ is a hyper-parameterto balance the two objective terms. The binary cross-entropy term LBCE guides the decoder to decode the fin-gerprint embedded by the encoder. The mean squared er-ror term LMSE penalizes any deviation of the stego imageE(x,w) from the original image x. The architectures of Eand D are depicted in the supplementary material.

Artificial fingerprint embedding. In this stage, we usethe well trained E and D networks. We allocate each train-ing dataset X a unique fingerprint w. We apply the trainedE to each training image x and collect a fingerprinted train-ing dataset Xw = {E(x,w)|x ∈ X}.

Generative model training. In order to have a solu-tion that is agnostic to the evolution of generative mod-els, we intentionally do not intervene with their training.It makes our solution plug-and-play for arbitrary genera-tion tasks without touching their implementations, and in-troduces zero overhead to model training. We simply re-place X with Xw to train the generative model in its originalprotocol.

Artificial fingerprint decoding. We hypothesize thetransferability of our artificial fingerprints from train-ing data to generative models: a well-trained generatorGw(z) 7→ xw contains, in all generated images, the samefingerprint information w (as embedded in the training dataxw). We justify this hypothesis in Section 5.2. As a result,the artificial fingerprint can be recovered from a generatedimage xw using the decoder D: D(xw) ≡ w. Based onthis transferability, we can formulate deepfake attributionas fingerprint matching using our decoder D.

Artificial fingerprint matching. To support robustnessto post-generation modifications that could be applied to thegenerated images, we relax the matching of the decoded ar-tificial fingerprints to a soft matching. We perform a nullhypothesis test given the number of matching bits k be-tween the decoded fingerprint w and the fingerprint w usedin generative model training. The null hypothesisH0 is get-ting this number of successes (i.e. matching bits) by chance.Under the null hypothesis, the probability of matching bits(random variable X) follows a binomial distribution: thenumber of trials n is the number of bits in the fingerprintsequence, and k is the number of successes where each bithas a 0.5 probability of success. We can then measure the p-value of the hypothesis test by computing the probability of

getting k or higher matching bits under the null hypothesis:

Pr(X > k|H0) =

n∑i=k

(n

i

)0.5n (5)

The fingerprint is verified, w ∼ w, if the null hypothesis re-sults in a very low probability (p-value). Usually, when thep-value is smaller than 0.05, we reject the null hypothesisand regard 1− p as the verification confidence.

5. ExperimentsWe describe the experimental setup in Section 5.1. We

first evaluate the required proprieties of our solution: thetransferability and universality of our artificial fingerprint inSection 5.2, its fidelity in Section 5.3, its robustness in Sec-tion 5.4, and its secrecy in Section 5.5. The transferabilityin turn enables accurate deepfake detection and attribution,which is evaluated and compared in Section 5.6 and 5.7 re-spectively. In addition, we articulate our network designsand training details in the supplementary material.

5.1. Setup

Generative models. As a proactive solution, it shouldbe agnostic to genetative models. Without losing repre-sentativeness, we focus on three generation applicationswith their state-of-the-art models. For unconditional gener-ation: ProGAN [25], StyleGAN [26], and StyleGAN2 [27];for class-conditional generation: BigGAN [5]; for image-conditional generation, i.e., image-to-image translation:CUT [36]. Each model is trained from scratch with the of-ficial implementation.

Datasets. Each generation application benchmarks itsown datasets. For unconditional generation, we train/teston 150k/50k CelebA [31] at 128×128 resolution, 50k/50kLSUN Bedroom [50] at 128×128 resolution, and the mostchallenging one, 50k/50k LSUN Cat [50] at its original256×256 resolution. For class-conditional generation, weexperiment on the entire CIFAR-10 dataset [28] with theoriginal training/testing split at the original 32×32 resolu-tion. For image-conditional generation, we experiment onthe entire Horse→Zebra dataset [59] and Cat→Dog [11]dataset with the original training/testing split at the original256×256 resolution. We only need to fingerprint imagesfrom the target domains.

5.2. Transferability

The transferability means that the artificial fingerprintsthat are embedded in the training data also appear consis-tently in all the generated data. This is a non-trivial hypoth-esis in Section 4 and needs to be justified by the fingerprintdetection accuracy.

Evaluation. Fingerprints are represented as binary vec-tors w ∈ {0, 1}n. We use bitwise accuracy to evaluate the

Page 5: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

detection accuracy. We set n = 100 as suggested in [44].We also report p-value for the confidence of detection.

Baselines. For comparison, we implement a straightfor-ward baseline method. Instead of embedding fingerprintsinto training data, we enforce fingerprint generation jointlywith model training. That is, we train on clean data, and en-force generated images to not only approximate real train-ing images but also contain a specific fingerprint. Mathe-matically,

minG,D

maxDis

Ez∼N (0,I),x∼XLadv(z, x;G,Dis)+

ηEz∼N (0,I),w∼{0,1}nLBCE(z,w;G,D)(6)

where G and Dis are the original generator and discrimi-nator in the GAN framework, Ladv is the original GAN ob-jective, and LBCE is adapted from Eq. 2 where we replacew = D(E(x,w)) with w = D(G(z)). η is set to 1.0 as ahyper-parameter to balance the two objective terms.

We also compare the deep-learning-based steganogra-phy technique used in our solution ([44]) to two well-established, non-deep learning steganographic methods [2,1] that alter the frequency coefficients of JPEG compres-sion.

Results. We report the fingerprint detection performancein Table 1 fourth and fifth columns. We observe:

(1) The “Data” row shows the detection accuracy on realtesting images for sanity checks: it reaches the 100% satu-rated accuracy, indicating the effectiveness of the steganog-raphy technique by its nature.

(2) Our artificial fingerprints can be almost perfectly andconfidently detected from generated images over a varietyof applications, generative models, and datasets. The ac-curacy is ≥ 0.98 except for ProGAN on LSUN Bedroom,but its 0.93 accuracy and 10−19 p-value are far sufficient toverify the presence of fingerprints. Our hypothesis on thetransferability from training data to generative models (i.e.generated data) is therefore justified. As a result, artificialfingerprints are qualified for deepfake detection and attribu-tion.

(3) The universality of fingerprint transferability overvarying tasks and models validates our solution is agnosticto generative model techniques.

(4) The baseline of joint fingerprinting and generationtraining (first row) is also moderately effective in terms offingerprint detection, but we show in Section 5.3 it leads tostrong deterioration of generation quality.

(5) Conventional steganography methods [2, 1] (secondand third rows) do not transfer hidden information intomodels, indicated by the random guess performance dur-ing decoding. We attribute this to the discrepancy be-tween deep generation techniques and shallow steganogra-phy techniques. We reason that generative models lever-age deep discriminators to approximate common image pat-terns including low-level fingerprints. Only comparably

Fgpt Bit Orig FgptDataset tech Model acc ⇑ p-value FID FID ⇓

CelebA

Eq. 6 ProGAN 0.93 < 10−19 14.09 60.28[2] StyleGAN2 0.51 0.46 6.41 6.93[1] StyleGAN2 0.53 0.31 6.41 6.82

[44] Data 1.00 - - 1.15[44] ProGAN 0.98 < 10−26 14.09 14.38[44] StyleGAN 0.99 < 10−28 8.98 9.72[44] StyleGAN2 0.99 < 10−28 6.41 6.23

LSUN[44] ProGAN 0.93 < 10−19 29.16 32.58[44] StyleGAN 0.98 < 10−26 24.95 25.71

Bedroom [44] StyleGAN2 0.99 < 10−28 13.92 14.71

LSUN[44] ProGAN 0.98 < 10−26 45.22 48.97[44] StyleGAN 0.99 < 10−28 33.45 34.01

Cat [44] StyleGAN2 0.99 < 10−28 31.01 32.60

CIFAR-10 [44] BigGAN 0.99 < 10−28 6.25 6.80

Horse→Zebra [44] CUT 0.99 < 10−28 22.98 23.43Cat→Dog [44] CUT 0.99 < 10−28 55.78 56.09

Table 1: Artificial fingerprint detection in bitwise accuracy(⇑ indicates higher is better) and generation quality in FID(⇓ indicates lower is better). The “Data” row corresponds toreal testing images for a sanity check. The “Orig FID” col-umn corresponds to the original (non-fingerprinted) modelsfor references. The first three rows are the baselines.

deep-learning-based fingerprinting techniques, e.g. [44], arecompatible to hide and transfer fingerprints to the mod-els, while hand-crafted image processing is not effective.Therefore, the transferability of our fingerprinting is non-trivial.

5.3. Fidelity

The fidelity of generated images is as critical as the trans-ferability. Fingerprinting should have a negligible side ef-fect on the functionality of generative models. This pre-serves the original generation quality and avoids the adver-sary’s suspect of the presence of fingerprints. The steganog-raphy technique we used should enable this, which we vali-date empirically.

Evaluation. We use Frechet Inception Distance(FID) [20] to evaluate the generation quality; the lower, themore realistic. We measure FID between a set of 50k gener-ated images and a set of 50k real non-fingerprinted images,in order to evaluate the quality of the generated set. Whencalculating different FIDs for each dataset, the real set isunchanged.

Results. We compare the generation quality of origi-nal and fingerprinted generative models in Table 1 sixth andseventh columns. We observe:

(1) The “Data” rows are for sanity checks: embeddingfingerprints into real images does not substantially deteri-orate image quality: FID ≤ 1.15 is in an excellent realismrange. This validates the secrecy of the steganographic tech-nique and lays a valid foundation for high-quality model

Page 6: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

(a) (b) (c) (d) (e)

Figure 2: CelebA samples at 128×128 for Table 1 last two columns. (a) Original real training samples. (b) Fingerprintedreal training samples. (c) The difference between (a) and (b), 10× magnified for easier visualization. (d) Samples fromthe non-fingerprinted ProGAN. (e) Samples from the fingerprinted ProGAN. See more samples on the other datasets in thesupplementary material.

training.(2) For a variety of settings, the performance of the fin-

gerprinted generative models tightly sticks to the originallimits of their non-fingerprinted baselines. The heaviest de-terioration is as small as +3.75 FID happening for ProGANon LSUN Cat. In practice, the generated fingerprints areimperceptibly hidden in the generated images and can onlybe perceived under 10×magnification. See Figure 2 and thesupplementary material for demonstrations. Therefore, thefidelity of fingerprinted models is justified and it qualifiesour solution for deepfake detection and attribution.

(3) The baseline of joint fingerprinting and generationtraining (first row) deteriorates generation quality remark-ably. This indicates model fingerprinting is a non-trivialtask: direct fingerprint reconstruction distracts adversarialtraining. In contrast, our solution leverages image steganog-raphy and fingerprint transferability, sidesteps this issue,and leads to better performance.

5.4. Robustness

Deepfake media and generative models may undergopost-processing or perturbations during broadcasts. We val-idate the robustness of our fingerprint detection given a va-riety of image and model perturbations, and investigate thecorresponding working ranges.

Perturbations. We evaluate the robustness against fourtypes of image perturbation: additive Gaussian noise, blur-ring with Gaussian kernel, JPEG compression, center crop-ping. We also evaluate the robustness against two typesof model perturbations: model weight quantization andadding Gaussian noise to model weights. For quantization,we compress each model weight given a decimal precision.We vary the amount of perturbations, apply each to the gen-erated images or to the model directly, and detect the finger-print using the pre-trained decoder.

Results. We evaluate the artificial fingerprint detectionover 50k images from a fingerprinted ProGAN. We plot the

bitwise accuracy w.r.t. the amount of perturbations in Fig-ure 3 (see the supplementary material for additional resultsof ProGAN trained on LSUN Bedroom). We observe:

(1) For all the image perturbations, fingerprint detectionaccuracy drops monotonously as we increase the amount ofperturbation, while for small perturbations accuracy dropsrather slowly. We consider accepting accuracy ≥ 75% as athreshold (p-value = 2.8× 10−7). This results in the work-ing range w.r.t. each perturbation: Gaussian noise standarddeviation ∼ [0.0, 0.05], Gaussian blur kernel size ∼ [0, 5],JPEG compression quality∼ [50, 100], center cropping size∼ [86, 128], quantization decimal precision ≤ 10−1, andmodel noise standard deviation∼ [0.0, 0.18], which are rea-sonably wide ranges in practice.

(2) For image perturbations (the left four subplots) out-side the above working ranges, the reference upper boundsdrop even faster and the margins to the testing curves shrinkquickly, indicating that the detection deterioration is irrele-vant to model training but rather relevant to the heavy qual-ity deterioration of training images.

(3) For model perturbations (the right two subplots) out-side the above working ranges, image quality deterioratesfaster than fingerprint accuracy: even before the accuracygets lower than 75%, FID has already increased by> 500%.

(4) As a result of (2) and (3), before fingerprint detec-tion degenerates close to random guess (∼ 50% accuracy),image quality has been heavily deteriorated by strong per-turbations (Figure 4), which indicates that our fingerprintsare more robust than image functionality itself in the caseof these studied perturbations.

Discussion on attacks. Other attacks that require train-ing counter models might be conceivable. For example, totrain a model that removes the fingerprints from generatedimages (e.g. a denoising autoencoder). However, this wouldrequire attackers to have paired training images before andafter the fingerprint embedding. In our scenario, we assumethat the fingerprint encoder is not released which hinders

Page 7: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

Figure 3: Red plots show the artificial fingerprint detection in bitwise accuracy w.r.t. the amount of perturbations overProGAN trained on CelebA. In the left four plots (robustness against image perturbations), blue dots represent detectionaccuracy on the fingerprinted real training images, which serve as the upper bound references for the red dots. See thesupplementary material for additional results of ProGAN trained on LSUN Bedroom. In the right two plots (robustnessagainst model perturbations), blue dots represent the FID of generated images from the perturbed models.

(a)Original

0.99 bit acc

(b) Gaussian noiseStd 0.1

0.77 bit acc

(c) BlurringKernel size 50.75 bit acc

(d) JPEGQuality 35%0.75 bit acc

(e) CroppingCrop size 640.80 bit acc

(f) QuantizePrecision 100

0.64 bit acc

(g) Model noiseStd 0.16

0.77 bit acc

Figure 4: Perturbed image samples from the fingerprinted ProGAN and the corresponding fingerprint detection accuracy.The detection still performs robustly (bitwise accuracy ≥ 0.75) even when the image quality heavily deteriorates.

this training data collection requirement.

5.5. Secrecy

The presence of a fingerprint embedded in a generativemodel should not be easily detected by a third party, other-wise, it would be potentially manipulated.

Attack. In order to quantify the secrecy, we formulate itas a binary classification problem: the presence of a finger-print in an image. We follow the attack protocol in [58] toperform the Artificial Training Sets (ATS) attack [30]. Weaim to classify between fingerprinted and non-fingerprintedtest images, but we have no supervision. The intuition is totrain another steganography encoder to expand the test setand establish an artificial setting with known labels whichenables supervised training. The original test class space isnow a subspace of the artificial training class space and isseparable by the training task.

The attack is as follows: We independently train an-other steganography encoder. We consider the originaltest images as negative training samples. Then, we ap-ply the encoder twice to the test set to obtain extra im-ages fingerprinted 2 times (corresponding to originally non-fingerprinted images) or 3 times (corresponding to origi-nally fingerprinted images), which are regarded as positivetraining samples. Then we train an SVM classifier [9] usingsuch positive and negative samples, in order to separate be-tween images fingerprinted 0-1 time (original set), and the

ones fingerprinted 2-3 times (artificial training set).During testing, we first apply the encoder once to the

test images so that the originally non-fingerprinted imagesare now fingerprinted 1 time (belonging to 0-1 class), andthe originally fingerprinted images are now fingerprinted 2times (belonging to 2-3 class). Then we can use the classi-fier to separate them and propagate the predictions back tothe original images. We evaluate the attack on a set of 250fingerprinted deepfake images and 250 non-fingerprinteddeepfake images.

Results. The binary classification accuracy on the exis-tence of fingerprint is 0.502 according to the ATS attack,which is close to random guess. It indicates our fingerprint-ing is secret enough from being detected by adversaries whohave no access to our encoder and decoder. We reason thatthe steganography encoder trained from different initializa-tion uses different patterns to hide the fingerprint, and there-fore does not couple well with the victim encoder. This alsosupports our previous discussion of the importance of keep-ing the encoder private to support both the secrecy and therobustness of the artificial fingerprints.

5.6. Deepfake Detection

In the previous sections, we showed that our fingerprint-ing solution is effective in transferring the fingerprints andmeeting the other required criteria. We now discuss how touse it for deepfake detection and attribution.

Page 8: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

Unlike existing methods that detect intrinsic differencesbetween the real and deepfake classes [52, 57, 49, 13], we,standing for model inventors, propose a proactive solutionby embedding artificial fingerprints into generative modelsand consequently into the generated images. In practice, re-sponsible model inventors, different from malicious deep-fake users, should be eager/willing to do so. Then we con-vert the problem to verifying if one decoded fingerprint isin our fingerprint regulation database or not. Even with anon-perfect detection accuracy, we can still use our solutionbased on the null hypothesis test in Section 4. We considerdeepfake verification given ≥ 75% (p-value = 2.8× 10−7)bit matching. This is feasible based on two assumptions:(1) The decoded fingerprint of a real image is random; and(2) the fingerprint capacity is large enough such that the ran-dom fingerprint from a real image is unlikely to collide witha regulated fingerprint in the database. The second condi-tion is trivial to satisfy, considering we sample fingerprintsw ∈ {0, 1}n and n = 100. 2100 is a large enough capac-ity. Then we validate the first assumption by the deepfakedetection experiments below.

Baselines. We compare to two recent state-of-the-artCNN-based deepfake detectors [52, 49] as baselines. [52]is trained on 40k real images and 40k generated imagesequally from four generative models with distinct finger-prints. We consider the open-world scenario where dis-joint generative models are used in training and testing, tochallenge the classifier’s generalization. For [49] we usethe officially released model because they already claimimproved generalization across different generation tech-niques.

Results. We compare our solution to the two base-lines on a variety of generation applications, models, anddatasets. We test on 4k real images and 4k generated im-ages equally from four generative models with distinct fin-gerprints. We report deepfake detection accuracy in Table 2fourth column. We observe:

(1) Our solution performs perfectly (100% accuracy) forall the cases, turning open-world deepfake detection into atrivial fingerprinting detection and matching problem.

(2) [52] deteriorates to random guess (∼ 50% accuracy)because of the curse of domain gap between training andtesting models. In contrast, our solution benefits from be-ing agnostic to generative models. It depends only on thepresence of fingerprints rather than the discriminative cuesthat are overfitted during training.

(3) Our solution outperforms [49] with clear margins. Inparticular, [49] degenerates when model techniques evolveto be more powerful (from ProGAN to StyleGAN2), or con-dition on some input guidance. On the contrary, our proac-tive solution synergizes with this evolution with high finger-print detection accuracy, and therefore, with perfect deep-fake detection accuracy.

Detection AttributionDataset Model Detector acc ⇑ acc ⇑

CelebA

ProGAN[52] 0.508 0.235[49] 0.924 N/AOurs 1.000 1.000

StyleGAN[52] 0.497 0.168[49] 0.906 N/AOurs 1.000 1.000

StyleGAN2[52] 0.500 0.267[49] 0.895 N/AOurs 1.000 1.000

LSUN

ProGAN[52] 0.493 0.597[49] 0.952 N/AOurs 1.000 1.000

StyleGAN[52] 0.499 0.366[49] 0.956 N/A

Bedroom Ours 1.000 1.000

StyleGAN2[52] 0.491 0.267[49] 0.930 N/AOurs 1.000 1.000

LSUN

ProGAN [49] 0.951 N/AOurs 1.000 1.000

StyleGAN [49] 0.923 N/ACat Ours 1.000 1.000

StyleGAN2 [49] 0.905 N/AOurs 1.000 1.000

CIFAR-10 BigGAN [49] 0.815 N/AOurs 1.000 1.000

Horse→Zebra CUT [49] 0.836 N/AOurs 1.000 1.000

Cat→Dog CUT [49] 0.902 N/AOurs 1.000 1.000

Table 2: Deepfake detection and attribution accuracy (⇑ in-dicates higher is better). [49] is not applicable to the multi-source attribution scenarios in the last column.

(4) In general, although [49] generalizes better than [52],it is still subject to future adversarial evolution of generativemodels, which were witnessed rapidly progressing over thelast few years. For example, [49] was effectively evadedin [7] by extremely small perturbations. In contrast, ourwork offers higher sustainability in the long run by proac-tively enforcing a margin between real and generated im-ages. This requires and enables responsible model inven-tors’ disclosure against potential misuses of their models.

5.7. Deepfake Attribution

The goal of attribution is to trace the model source thatgenerated a deepfake. It upgrades the binary classificationin detection to multi-class classification. Our artificial fin-gerprint solution can be easily extended for attribution andenable us, standing for model inventors, to attribute respon-sibility to our users when misuses occur.

Baseline. [49] is not applicable to multi-source attribu-tion. We only compare to [52] in the open-world scenario,

Page 9: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

i.e., the training and testing sets of generative models arenot fully overlapping. Given 40k generated images equallyfrom four generative models with distinct fingerprints, weuse [52] to train four one-vs-all-the-others binary classifiers.During testing, all four classifiers are applied to an image.We assign the image to the class with the highest confidenceif not all the classifiers reject that image. Otherwise, it is as-signed to the unknown label.

Results. We compare our solution to [52] on CelebAand LSUN Bedroom. We test on 4k/4k generated imagesequally from four model sources that are in/out of the train-ing set of [52]. We report deepfake attribution accuracy inTable 2 last column. We obtain the same discoveries andconclusions as those of deepfake detection in Section 5.6.The open-world attribution deteriorates for the CNN clas-sifier [52] while our fingerprinting solution maintains theperfect (100%) accuracy.

6. Conclusion

Detecting deepfakes is a complex problem due to therapid development of generative models and the possibleadversarial countermeasure techniques. For the sake of sus-tainability, we investigate a proactive solution on the modelinventors’ side to make deepfake detection agnostic to gen-erative models. We root deepfake detection into trainingdata, and demonstrate the transferability of artificial finger-prints from training data to a variety of generative models.Our empirical study shows several beneficial properties offingerprints, including universality, fidelity, robustness, andsecrecy. Experiments demonstrate our perfect detection andattribution accuracy that outperforms two recent state of theart. As there have been recent concerns about the releaseof powerful generative techniques, our solution closes theresponsibility loop between publishing pre-trained genera-tive model inventions and their possible misuses. It opensup possibilities for inventors’ responsibility disclosure byallocating each model a unique fingerprint.

7. Acknowledgement

Ning Yu is partially supported by Twitch Research Fel-lowship. Vladislav Skripniuk is partially supported by IM-PRS scholarship from Max Planck Institute. We thankDavid Jacobs, Matthias Zwicker, Abhinav Shrivastava,Yaser Yacoobfor, and Apratim Bhattacharyya for construc-tive discussion and advice.

References[1] steghide, http://steghide.sourceforge.net. 2, 5[2] outguess, http://www.outguess.org/. 2, 5[3] Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas,

and Joseph Keshet. Turning your weakness into a strength:

Watermarking deep neural networks by backdooring. InUSENIX, 2018. 2

[4] Shumeet Baluja. Hiding images in plain sight: Deepsteganography. In NeurIPS, 2017. 1, 2, 3

[5] Andrew Brock, Jeff Donahue, and Karen Simonyan. Largescale gan training for high fidelity natural image synthesis.In ICLR, 2018. 1, 2, 4, 12

[6] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Sub-biah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan,Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan-guage models are few-shot learners. In arXiv, 2020. 3

[7] Nicholas Carlini and Hany Farid. Evading deepfake-imagedetectors with white-and black-box attacks. In CVPR Work-shops, 2020. 1, 3, 8

[8] Francois Cayre, Caroline Fontaine, and Teddy Furon. Water-marking security: theory and practice. In TSP, 2005. 2

[9] Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library forsupport vector machines. In TIST, 2011. 7

[10] Huili Chen, Bita Darvish Rouhani, Cheng Fu, Jishen Zhao,and Farinaz Koushanfar. Deepmarks: A secure fingerprintingframework for digital rights management of deep learningmodels. In ICMR, 2019. 2

[11] Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha.Stargan v2: Diverse image synthesis for multiple domains.In CVPR, 2020. 4

[12] Ingemar Cox, Matthew Miller, Jeffrey Bloom, and ChrisHonsinger. Digital watermarking. Springer, 2002. 2

[13] Ricard Durall, Margret Keuper, and Janis Keuper. Watchyour up-convolution: Cnn based generative deep neural net-works are failing to reproduce spectral distributions. InCVPR, 2020. 1, 3, 8

[14] Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, andJanis Keuper. Unmasking deepfakes with simple features.arXiv, 2019. 1, 3

[15] Joel Frank, Thorsten Eisenhofer, Lea Schonherr, Asja Fis-cher, Dorothea Kolossa, and Thorsten Holz. Leveraging fre-quency analysis for deep fake image recognition. In ICML,2020. 1

[16] Jessica Fridrich. Steganography in digital media: principles,algorithms, and applications. Cambridge University Press,2009. 2

[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, BingXu, David Warde-Farley, Sherjil Ozair, Aaron Courville, andYoshua Bengio. Generative adversarial nets. In NeurIPS,2014. 1, 2

[18] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, VincentDumoulin, and Aaron C Courville. Improved training ofwasserstein gans. In NeurIPS, 2017. 1, 2

[19] Jamie Hayes and George Danezis. Generating stegano-graphic images via adversarial training. In NeurIPS, 2017.2

[20] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner,Bernhard Nessler, and Sepp Hochreiter. Gans trained by atwo time-scale update rule converge to a local nash equilib-rium. In NeurIPS, 2017. 5

[21] Vojtech Holub and Jessica Fridrich. Designing stegano-graphic distortion using directional filters. In WIFS, 2012.2

Page 10: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

[22] Vojtech Holub, Jessica Fridrich, and Tomas Denemark. Uni-versal distortion function for steganography in an arbitrarydomain. In EURASIP JIS, 2014. 2

[23] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei AEfros. Image-to-image translation with conditional adver-sarial networks. In CVPR, 2017. 2

[24] Charlotte Jee. An indian politician is using deepfake tech-nology to win new voters. 2020. 1

[25] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen.Progressive growing of gans for improved quality, stability,and variation. In ICLR, 2018. 1, 2, 4, 12

[26] Tero Karras, Samuli Laine, and Timo Aila. A style-basedgenerator architecture for generative adversarial networks. InCVPR, 2019. 1, 2, 4, 12

[27] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten,Jaakko Lehtinen, and Timo Aila. Analyzing and improvingthe image quality of stylegan. In CVPR, 2020. 1, 2, 4, 12

[28] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiplelayers of features from tiny images. Technical report, 2009.4

[29] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero,Andrew Cunningham, Alejandro Acosta, Andrew Aitken,Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative ad-versarial network. In CVPR, 2017. 2

[30] Daniel Lerch-Hostalot and David Megıas. Unsupervised ste-ganalysis based on artificial training sets. In EAAI, 2016. 7

[31] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang.Deep learning face attributes in the wild. In ICCV, 2015.4

[32] Zhengzhe Liu, Xiaojuan Qi, Jiaya Jia, and Philip Torr.Global texture enhancement for fake face detection in thewild. In CoRR, 2020. 3

[33] Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, andPeyman Milanfar. Distortion agnostic deep watermarking.In CVPR, 2020. 2

[34] Francesco Marra, Diego Gragnaniello, Luisa Verdoliva, andGiovanni Poggi. Do gans leave artificial fingerprints? InMIPR, 2019. 1, 3

[35] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, andYuichi Yoshida. Spectral normalization for generative ad-versarial networks. In ICLR, 2018. 1, 2

[36] Taesung Park, Alexei A. Efros, Richard Zhang, and Jun-Yan Zhu. Contrastive learning for unpaired image-to-imagetranslation. In ECCV, 2020. 2, 4, 12

[37] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-YanZhu. Semantic image synthesis with spatially-adaptive nor-malization. In CVPR, 2019. 2

[38] Tomas Pevny, Tomas Filler, and Patrick Bas. Using high-dimensional image models to perform highly undetectablesteganography. In IWIH, 2010. 2

[39] Alec Radford, Luke Metz, and Soumith Chintala. Unsuper-vised representation learning with deep convolutional gener-ative adversarial networks. In ICLR, 2016. 1, 2

[40] Alec Radford, Jeff Wu, Rewon Child, David Luan, DarioAmodei, and Ilya Sutskever. Language models are unsuper-vised multitask learners. In arXiv, 2019. 3

[41] Dan Robitzski. Someone used deepfake tech to invent a fakejournalist. 2020. 1

[42] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net:Convolutional networks for biomedical image segmentation.In MICCAI, 2015. 12

[43] Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar.Deepsigns: an end-to-end watermarking framework for pro-tecting the ownership of deep neural networks. In ASPLOS,2019. 2

[44] Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp:Invisible hyperlinks in physical photographs. In CVPR,2020. 1, 2, 3, 5, 12

[45] Justus Thies, Michael Zollhofer, Marc Stamminger, Chris-tian Theobalt, and Matthias Nießner. Face2face: Real-timeface capture and reenactment of rgb videos. In CVPR, 2016.3

[46] Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, andShin’ichi Satoh. Embedding watermarks into deep neuralnetworks. In ICMR, 2017. 2

[47] James Vincent. An online propaganda campaign used ai-generated headshots to create fake journalists. 2020. 1

[48] Vedran Vukotic, Vivien Chappelier, and Teddy Furon. Aredeep neural networks good for blind image watermarking?In WIFS, 2018. 2

[49] Sheng-Yu Wang, Oliver Wang, Richard Zhang, AndrewOwens, and Alexei A Efros. Cnn-generated images are sur-prisingly easy to spot... for now. In CVPR, 2020. 1, 2, 3,8

[50] Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, ThomasFunkhouser, and Jianxiong Xiao. Lsun: Construction of alarge-scale image dataset using deep learning with humansin the loop. arXiv, 2015. 4

[51] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, andThomas S Huang. Generative image inpainting with contex-tual attention. In CVPR, 2018. 2

[52] Ning Yu, Larry S Davis, and Mario Fritz. Attributing fakeimages to gans: Learning and analyzing gan fingerprints. InICCV, 2019. 1, 2, 3, 8, 9

[53] Ning Yu, Vladislav Skripniuk, Dingfan Chen, Larry Davis,and Mario Fritz. Responsible disclosure of generative mod-els using scalable fingerprinting. arXiv, 2020. 3

[54] Baiwu Zhang, Jin Peng Zhou, Ilia Shumailov, and NicolasPapernot. Not my deepfake: Towards plausible deniabilityfor machine-generated media. arXiv, 2020. 1

[55] Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc PhStoecklin, Heqing Huang, and Ian Molloy. Protecting intel-lectual property of deep neural networks with watermarking.In CCS Asia, 2018. 2

[56] Ru Zhang, Shiqi Dong, and Jianyi Liu. Invisible steganogra-phy via generative adversarial networks. In Multimedia Toolsand Applications, 2019. 2

[57] Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detectingand simulating artifacts in gan fake images. In WIFS, 2019.1, 3, 8

[58] Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei.Hidden: Hiding data with deep networks. In ECCV, 2018. 2,7

Page 11: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

[59] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei AEfros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017. 2, 4

[60] Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Dar-rell, Alexei A Efros, Oliver Wang, and Eli Shechtman. To-ward multimodal image-to-image translation. In NeurIPS,2017. 2

Page 12: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

8. Supplementary MaterialA. Implementation Details

Steganography encoder. The encoder is trained to em-bed a fingerprint into an image while minimizing the pixeldifference between the input and stego images. We followthe technical details in [44]. The binary fingerprint vectoris first passed through a fully-connected layer and then re-shaped as a tensor with one channel dimension and withthe same spatial dimension of the cover image. We thenconcatenate this fingerprint tensor and the image along thechannel dimension as the input to a U-Net architecture [42].The output of the encoder, the stego image, has the samesize as that of the input image. Note that passing the finger-print through a fully-connected layer allows for every bit ofthe binary sequence to be encoded over the entire spatial di-mensions of the input image and flexible to the image size.The fingerprint length is set to 100 as suggested in [44]. Thelength of 100 bits leads to a large enough space for finger-print allocation while not having a side effect on the fidelityperformance. We visualize an example of encoder archi-tecture in Figure 5 with image size 128×128 for CelebAand LSUN Bedroom. For the other image sizes, the archi-tectures are simply scaled up or down with more or fewerlayers.

Steganography decoder. The decoder is trained to de-tect the hidden fingerprint from the stego image. We fol-low the technical details in [44]. It consists of a series ofconvolutional layers with kernel size 3x3 and strides ≥ 1,dense layers, and a sigmoid output activation to produce afinal output with the same length as the binary fingerprintvector. We visualize an example of decoder architecture inFigure 6 with image size 128×128 for CelebA and LSUNBedroom. For the other image sizes, the architectures aresimply scaled up or down with more or fewer layers.

Steganography training. The encoder and decoder arejointly trained end-to-end w.r.t. the objective in Eq. 1 in themain paper and with randomly sampled fingerprints. Theencoder is trained to balance fingerprint detection and im-age reconstruction. At the beginning of training, we setλ = 0 to focus on fingerprint detection, otherwise, finger-prints cannot be accurately embedded into images. Afterthe fingerprint detection accuracy achieves 95% (that takes3-5 epochs), we increase λ linearly up to 10 within 3k iter-ations to shift our focus more on image reconstruction. Wetrain the encoder and decoder for 30 epochs in total. Giventhe batch size of 64, it takes about 0.5/2/4 hours to jointlytrain a 32/128/256-resolution encoder and decoder using 1NVIDIA Tesla V100 GPU with 16GB memory.

Our steganography code is modified from the GitHubrepository of StegaStamp [44] official implementation1.Our solution is favorably agnostic to generative model tech-

1https://github.com/tancik/StegaStamp

niques because we only process the training data. There-fore, for generative model training, we directly refer tothe corresponding GitHub repositories without any change:ProGAN [25]2, StyleGAN [26] and StyleGAN2 [27] (con-fig E)3, BigGAN [5]4, and CUT [36]5.

B. Additional Samples

See Figure 7, 8, 9, 10, and 11 for fingerprinted sam-ples on a variety of generation applications, models, anddatasets. We obtain the same conclusion as in Section 5.3in the main paper: The fingerprints are imperceptibly trans-ferred to the generative models and then to generated im-ages.

C. Robustness of ProGAN on LSUN Bedroom

We in additional experiment on the robustness of Pro-GAN on LSUN Bedroom. We plot the bitwise accuracyw.r.t. the amount of perturbations in Figure 12. We obtainthe same conclusions as those in Section 5.4 in the mainpaper. In specific, the working range w.r.t. each perturba-tion: Gaussian noise standard deviation ∼ [0.0, 0.1], Gaus-sian blur kernel size ∼ [0, 7], JPEG compression quality∼ [30, 100], and center cropping size ∼ [108, 128], whichare reasonably wide ranges in practice.

2https://github.com/jeromerony/Progressive_Growing_of_GANs-PyTorch

3https://github.com/NVlabs/stylegan24https://github.com/ajbrock/BigGAN-PyTorch5https://github.com/taesungp/

contrastive-unpaired-translation

Page 13: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

Figure 5: Steganography encoder architecture.

Figure 6: Steganography decoder architecture.

(a) (b) (c) (d) (e)

Figure 7: LSUN Bedroom samples at 128×128 for Table 1 last two columns in the main paper, supplementary to Figure 2in the main paper. (a) Original real training samples. (b) Fingerprinted real training samples. (c) The difference between(a) and (b), 10× magnified for easier visualization. (d) Samples from the non-fingerprinted ProGAN. (e) Samples from thefingerprinted ProGAN.

Page 14: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

(a) (b)

Figure 8: LSUN Cat samples at 256×256 for Table 1 last two columns in the main paper, supplementary to Figure 2 in themain paper. (a) Samples from the non-fingerprinted StyleGAN2. (b) Samples from the fingerprinted StyleGAN2.

(a) (b)

Figure 9: CIFAR-10 samples at 32×32 for Table 1 last two columns in the main paper, supplementary to Figure 2 in the mainpaper. (a) Samples from the non-fingerprinted BigGAN. (b) Samples from the fingerprinted BigGAN.

Page 15: A GAN FINGERPRINTS: ROOTING DEEP FAKE ATTRIBUTION IN ... · ARTIFICIAL GAN FINGERPRINTS: ROOTING DEEP- FAKE ATTRIBUTION IN TRAINING DATA Ning Yu1,2 Vladislav Skripniuk 3Sahar Abdelnabi

(a) (b) (c)

Figure 10: Horse→Zebra samples at 256×256 for Table 1 last two columns in the main paper, supplementary to Figure 2 inthe main paper. (a) Real source samples for input conditioning. (b) Samples from the non-fingerprinted CUT. (c) Samplesfrom the fingerprinted CUT.

(a) (b) (c)

Figure 11: Cat→Dog samples at 256×256 for Table 1 last two columns in the main paper, supplementary to Figure 2 in themain paper. (a) Real source samples for input conditioning. (b) Samples from the non-fingerprinted CUT. (c) Samples fromthe fingerprinted CUT.

Figure 12: Red plots show the artificial fingerprint detection in bitwise accuracy w.r.t. the amount of perturbations overProGAN trained on LSUN Bedroom. Blue dots represent detection accuracy on the fingerprinted real training images, whichserve as the upper bound references for the red dots. This is supplementary to Figure 3 in the main paper.