robustness in gans and in black-box optimization · demonstrate much more diversity in generated...

15
Robustness in GANs and in Black-box Optimization Stefanie Jegelka MIT CSAIL joint work with Zhi Xu, Chengtao Li, Ilija Bogunovic, Jonathan Scarlett and Volkan Cevher

Upload: others

Post on 28-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Robustness in GANs and in Black-box Optimization

Stefanie JegelkaMIT CSAIL

joint work with Zhi Xu, Chengtao Li, Ilija Bogunovic, Jonathan Scarlett and Volkan Cevher

Page 2: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Robustness in ML

Robustness in GANs

Representational Power in Deep Learning

Robust Black-Box Optimization

One unit is enough!

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Robust Optimization, Generalization,

Discrete & Nonconvex Optimization

Generator

Critic

“noise”

Page 3: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Generative Adversarial Networks

Generator

Discriminator

min

Gmax

DV (G,D)

Under review as a conference paper at ICLR 2018

(a) DCGAN, trained with BN (b) DCGAN, trained without BN

(c) DAN-S, trained without BN (d) DAN-2S, trained without BN

Figure 14: DCGAN, DAN-S and DAN-2S trained on CelebA dataset. Both DAN-S and DAN-2Sdemonstrate much more diversity in generated samples compared to DCGAN.

20

G(z)

random“noise”

z

real data

x

D(x), D(G(z))

• attack: with probability p<0.5, discriminator’s output is manipulated

• generator doesn’t know which feedback is honest

max

DV (G,D)

minG

V (G,A(D))

discriminator: generator:

Page 4: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Generative Adversarial Networks

Generator

Discriminator

Under review as a conference paper at ICLR 2018

(a) DCGAN, trained with BN (b) DCGAN, trained without BN

(c) DAN-S, trained without BN (d) DAN-2S, trained without BN

Figure 14: DCGAN, DAN-S and DAN-2S trained on CelebA dataset. Both DAN-S and DAN-2Sdemonstrate much more diversity in generated samples compared to DCGAN.

20

G(z)

random“noise”

z

real data

x

D(x), D(G(z))

• attack: with probability p<0.5, discriminator’s output is manipulated

• generator doesn’t know which feedback is honest

Theorem: If adversary does a simple sign flip, then standard GAN no longer learns the right distribution.

A(D(x)) =

(1�D(x) with probability p

D(x) otherwise.

Page 5: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

What makes GANs more robust?Generator

Discriminator

random“noise”

z

x

min

Gmax

DV (G,D)

min

G

max

D

Ex⇠Pdata [f(D(x))] + E

z⇠Pz [f(1�D(G(z)))]

<latexit sha1_base64="4nw31n2Az8Y+3hkxcbT/GOnikGg=">AAADWHiclVJda9swFJXtdm2zr7R73ItYGCQsC1ZZt5a9lLWje8xgaQu2MbIip6KybCR5JDH6k4M+dH9lL5XTZCzJCtsFweHcc3XPvdyk4Exp3791XG9j89HW9k7j8ZOnz543d/fOVV5KQgck57m8TLCinAk60ExzellIirOE04vk+qTOX3ynUrFcfNOTgkYZHgmWMoK1peJdR4QZE/FZ2A0zPI5PYfgRWqSvkqT6bOJqDEPFst9U31KhpmNdDbHGxpigCmcmAjlKogr1/Fl0/VVgUtNekq4puotic9oed0wngm+WnEzXnEz/qzt6++/9zUPSAx8dvUcW7B/4R4e+OWtPOw+KV/+th4qbrUUCroOFtAXm0Y+bP8JhTsqMCk04VipAfqGjCkvNCKemEZaKFphc4xENLBQ4oyqqZoYMfG2ZIUxzaZ/QcMb+WVHhTKlJllhlvVa1mqvJv+WCUqeHUcVEUWoqyH2jtORQ57C+MjhkkhLNJxZgIpn1CskVlphoe4sNuwS0OvI6ON/vIb+Hvr5rHX+ar2MbvASvQBsg8AEcgy+gDwaAODfOL3fD3XR/esDb8nbupa4zr3kBlsLbuwPubfko</latexit><latexit sha1_base64="4nw31n2Az8Y+3hkxcbT/GOnikGg=">AAADWHiclVJda9swFJXtdm2zr7R73ItYGCQsC1ZZt5a9lLWje8xgaQu2MbIip6KybCR5JDH6k4M+dH9lL5XTZCzJCtsFweHcc3XPvdyk4Exp3791XG9j89HW9k7j8ZOnz543d/fOVV5KQgck57m8TLCinAk60ExzellIirOE04vk+qTOX3ynUrFcfNOTgkYZHgmWMoK1peJdR4QZE/FZ2A0zPI5PYfgRWqSvkqT6bOJqDEPFst9U31KhpmNdDbHGxpigCmcmAjlKogr1/Fl0/VVgUtNekq4puotic9oed0wngm+WnEzXnEz/qzt6++/9zUPSAx8dvUcW7B/4R4e+OWtPOw+KV/+th4qbrUUCroOFtAXm0Y+bP8JhTsqMCk04VipAfqGjCkvNCKemEZaKFphc4xENLBQ4oyqqZoYMfG2ZIUxzaZ/QcMb+WVHhTKlJllhlvVa1mqvJv+WCUqeHUcVEUWoqyH2jtORQ57C+MjhkkhLNJxZgIpn1CskVlphoe4sNuwS0OvI6ON/vIb+Hvr5rHX+ar2MbvASvQBsg8AEcgy+gDwaAODfOL3fD3XR/esDb8nbupa4zr3kBlsLbuwPubfko</latexit><latexit sha1_base64="4nw31n2Az8Y+3hkxcbT/GOnikGg=">AAADWHiclVJda9swFJXtdm2zr7R73ItYGCQsC1ZZt5a9lLWje8xgaQu2MbIip6KybCR5JDH6k4M+dH9lL5XTZCzJCtsFweHcc3XPvdyk4Exp3791XG9j89HW9k7j8ZOnz543d/fOVV5KQgck57m8TLCinAk60ExzellIirOE04vk+qTOX3ynUrFcfNOTgkYZHgmWMoK1peJdR4QZE/FZ2A0zPI5PYfgRWqSvkqT6bOJqDEPFst9U31KhpmNdDbHGxpigCmcmAjlKogr1/Fl0/VVgUtNekq4puotic9oed0wngm+WnEzXnEz/qzt6++/9zUPSAx8dvUcW7B/4R4e+OWtPOw+KV/+th4qbrUUCroOFtAXm0Y+bP8JhTsqMCk04VipAfqGjCkvNCKemEZaKFphc4xENLBQ4oyqqZoYMfG2ZIUxzaZ/QcMb+WVHhTKlJllhlvVa1mqvJv+WCUqeHUcVEUWoqyH2jtORQ57C+MjhkkhLNJxZgIpn1CskVlphoe4sNuwS0OvI6ON/vIb+Hvr5rHX+ar2MbvASvQBsg8AEcgy+gDwaAODfOL3fD3XR/esDb8nbupa4zr3kBlsLbuwPubfko</latexit><latexit sha1_base64="4nw31n2Az8Y+3hkxcbT/GOnikGg=">AAADWHiclVJda9swFJXtdm2zr7R73ItYGCQsC1ZZt5a9lLWje8xgaQu2MbIip6KybCR5JDH6k4M+dH9lL5XTZCzJCtsFweHcc3XPvdyk4Exp3791XG9j89HW9k7j8ZOnz543d/fOVV5KQgck57m8TLCinAk60ExzellIirOE04vk+qTOX3ynUrFcfNOTgkYZHgmWMoK1peJdR4QZE/FZ2A0zPI5PYfgRWqSvkqT6bOJqDEPFst9U31KhpmNdDbHGxpigCmcmAjlKogr1/Fl0/VVgUtNekq4puotic9oed0wngm+WnEzXnEz/qzt6++/9zUPSAx8dvUcW7B/4R4e+OWtPOw+KV/+th4qbrUUCroOFtAXm0Y+bP8JhTsqMCk04VipAfqGjCkvNCKemEZaKFphc4xENLBQ4oyqqZoYMfG2ZIUxzaZ/QcMb+WVHhTKlJllhlvVa1mqvJv+WCUqeHUcVEUWoqyH2jtORQ57C+MjhkkhLNJxZgIpn1CskVlphoe4sNuwS0OvI6ON/vIb+Hvr5rHX+ar2MbvASvQBsg8AEcgy+gDwaAODfOL3fD3XR/esDb8nbupa4zr3kBlsLbuwPubfko</latexit>

score for real data “inverse” score for fake data

1. Model properties: transformation function

If:• is strictly increasing and differentiable • symmetry: for then model is robust

f(a) = �f(1� a)<latexit sha1_base64="NCWBkO+loYvDvO6+wsYG+rYV6Iw=">AAAB9XicbVDLSgNBEOz1GeMr6tHLYBCSQ8KuCHoRgl48RjAPSNbQO5lNhsw+mJlVwpL/8OJBEa/+izf/xkmyB00saCiquunu8mLBlbbtb2tldW19YzO3ld/e2d3bLxwcNlWUSMoaNBKRbHuomOAha2iuBWvHkmHgCdbyRjdTv/XIpOJReK/HMXMDHITc5xS1kR78EpbJFan4JaeC5V6haFftGcgycTJShAz1XuGr249oErBQU4FKdRw71m6KUnMq2CTfTRSLkY5wwDqGhhgw5aazqyfk1Ch94kfSVKjJTP09kWKg1DjwTGeAeqgWvan4n9dJtH/ppjyME81COl/kJ4LoiEwjIH0uGdVibAhSyc2thA5RItUmqLwJwVl8eZk0z6qOXXXuzou16yyOHBzDCZTAgQuowS3UoQEUJDzDK7xZT9aL9W59zFtXrGzmCP7A+vwBB4eQPg==</latexit><latexit sha1_base64="NCWBkO+loYvDvO6+wsYG+rYV6Iw=">AAAB9XicbVDLSgNBEOz1GeMr6tHLYBCSQ8KuCHoRgl48RjAPSNbQO5lNhsw+mJlVwpL/8OJBEa/+izf/xkmyB00saCiquunu8mLBlbbtb2tldW19YzO3ld/e2d3bLxwcNlWUSMoaNBKRbHuomOAha2iuBWvHkmHgCdbyRjdTv/XIpOJReK/HMXMDHITc5xS1kR78EpbJFan4JaeC5V6haFftGcgycTJShAz1XuGr249oErBQU4FKdRw71m6KUnMq2CTfTRSLkY5wwDqGhhgw5aazqyfk1Ch94kfSVKjJTP09kWKg1DjwTGeAeqgWvan4n9dJtH/ppjyME81COl/kJ4LoiEwjIH0uGdVibAhSyc2thA5RItUmqLwJwVl8eZk0z6qOXXXuzou16yyOHBzDCZTAgQuowS3UoQEUJDzDK7xZT9aL9W59zFtXrGzmCP7A+vwBB4eQPg==</latexit><latexit sha1_base64="NCWBkO+loYvDvO6+wsYG+rYV6Iw=">AAAB9XicbVDLSgNBEOz1GeMr6tHLYBCSQ8KuCHoRgl48RjAPSNbQO5lNhsw+mJlVwpL/8OJBEa/+izf/xkmyB00saCiquunu8mLBlbbtb2tldW19YzO3ld/e2d3bLxwcNlWUSMoaNBKRbHuomOAha2iuBWvHkmHgCdbyRjdTv/XIpOJReK/HMXMDHITc5xS1kR78EpbJFan4JaeC5V6haFftGcgycTJShAz1XuGr249oErBQU4FKdRw71m6KUnMq2CTfTRSLkY5wwDqGhhgw5aazqyfk1Ch94kfSVKjJTP09kWKg1DjwTGeAeqgWvan4n9dJtH/ppjyME81COl/kJ4LoiEwjIH0uGdVibAhSyc2thA5RItUmqLwJwVl8eZk0z6qOXXXuzou16yyOHBzDCZTAgQuowS3UoQEUJDzDK7xZT9aL9W59zFtXrGzmCP7A+vwBB4eQPg==</latexit><latexit sha1_base64="NCWBkO+loYvDvO6+wsYG+rYV6Iw=">AAAB9XicbVDLSgNBEOz1GeMr6tHLYBCSQ8KuCHoRgl48RjAPSNbQO5lNhsw+mJlVwpL/8OJBEa/+izf/xkmyB00saCiquunu8mLBlbbtb2tldW19YzO3ld/e2d3bLxwcNlWUSMoaNBKRbHuomOAha2iuBWvHkmHgCdbyRjdTv/XIpOJReK/HMXMDHITc5xS1kR78EpbJFan4JaeC5V6haFftGcgycTJShAz1XuGr249oErBQU4FKdRw71m6KUnMq2CTfTRSLkY5wwDqGhhgw5aazqyfk1Ch94kfSVKjJTP09kWKg1DjwTGeAeqgWvan4n9dJtH/ppjyME81COl/kJ4LoiEwjIH0uGdVibAhSyc2thA5RItUmqLwJwVl8eZk0z6qOXXXuzou16yyOHBzDCZTAgQuowS3UoQEUJDzDK7xZT9aL9W59zFtXrGzmCP7A+vwBB4eQPg==</latexit>

a 2 [0, 1]<latexit sha1_base64="vrLdIUQl/+8iK1k9MGH6/+Tgi6I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGsxDWWznbRLN5uwuxFK6L/w4kERr/4bb/4bt20O2vpg4PHeDDPzwlRwbVz32ymtrK6tb5Q3K1vbO7t71f2Dtk4yxbDFEpGoTkg1Ci6xZbgR2EkV0jgU+BCObqb+wxMqzRN5b8YpBjEdSB5xRo2VHmmXS+K7Z17Qq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqrXvuakJcqoMZwInlW6mMaVsRAfoWyppjDrIZxdPyIlV+iRKlC1pyEz9PZHTWOtxHNrOmJqhXvSm4n+en5noKsi5TDODks0XRZkgJiHT90mfK2RGjC2hTHF7K2FDqigzNqSKDcFbfHmZtM/rnlv37i5qjesijjIcwTGcggeX0IBbaEILGEh4hld4c7Tz4rw7H/PWklPMHMIfOJ8/DUSP1w==</latexit><latexit sha1_base64="vrLdIUQl/+8iK1k9MGH6/+Tgi6I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGsxDWWznbRLN5uwuxFK6L/w4kERr/4bb/4bt20O2vpg4PHeDDPzwlRwbVz32ymtrK6tb5Q3K1vbO7t71f2Dtk4yxbDFEpGoTkg1Ci6xZbgR2EkV0jgU+BCObqb+wxMqzRN5b8YpBjEdSB5xRo2VHmmXS+K7Z17Qq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqrXvuakJcqoMZwInlW6mMaVsRAfoWyppjDrIZxdPyIlV+iRKlC1pyEz9PZHTWOtxHNrOmJqhXvSm4n+en5noKsi5TDODks0XRZkgJiHT90mfK2RGjC2hTHF7K2FDqigzNqSKDcFbfHmZtM/rnlv37i5qjesijjIcwTGcggeX0IBbaEILGEh4hld4c7Tz4rw7H/PWklPMHMIfOJ8/DUSP1w==</latexit><latexit sha1_base64="vrLdIUQl/+8iK1k9MGH6/+Tgi6I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGsxDWWznbRLN5uwuxFK6L/w4kERr/4bb/4bt20O2vpg4PHeDDPzwlRwbVz32ymtrK6tb5Q3K1vbO7t71f2Dtk4yxbDFEpGoTkg1Ci6xZbgR2EkV0jgU+BCObqb+wxMqzRN5b8YpBjEdSB5xRo2VHmmXS+K7Z17Qq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqrXvuakJcqoMZwInlW6mMaVsRAfoWyppjDrIZxdPyIlV+iRKlC1pyEz9PZHTWOtxHNrOmJqhXvSm4n+en5noKsi5TDODks0XRZkgJiHT90mfK2RGjC2hTHF7K2FDqigzNqSKDcFbfHmZtM/rnlv37i5qjesijjIcwTGcggeX0IBbaEILGEh4hld4c7Tz4rw7H/PWklPMHMIfOJ8/DUSP1w==</latexit><latexit sha1_base64="vrLdIUQl/+8iK1k9MGH6/+Tgi6I=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGsxDWWznbRLN5uwuxFK6L/w4kERr/4bb/4bt20O2vpg4PHeDDPzwlRwbVz32ymtrK6tb5Q3K1vbO7t71f2Dtk4yxbDFEpGoTkg1Ci6xZbgR2EkV0jgU+BCObqb+wxMqzRN5b8YpBjEdSB5xRo2VHmmXS+K7Z17Qq9bcujsDWSZeQWpQoNmrfnX7CctilIYJqrXvuakJcqoMZwInlW6mMaVsRAfoWyppjDrIZxdPyIlV+iRKlC1pyEz9PZHTWOtxHNrOmJqhXvSm4n+en5noKsi5TDODks0XRZkgJiHT90mfK2RGjC2hTHF7K2FDqigzNqSKDcFbfHmZtM/rnlv37i5qjesijjIcwTGcggeX0IBbaEILGEh4hld4c7Tz4rw7H/PWklPMHMIfOJ8/DUSP1w==</latexit>

f<latexit sha1_base64="b4HLEbhr7TEtaehBb4ygFYyuiV8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeiF48t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJYPZpqgH9GR5CFn1FipGQ7KFbfqLkDWiZeTCuRoDMpf/WHM0gilYYJq3fPcxPgZVYYzgbNSP9WYUDahI+xZKmmE2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGNn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNiUbgrf68jppX1U9t+o1ryv12zyOIpzBOVyCBzWowz00oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8AyvGM6g==</latexit><latexit sha1_base64="b4HLEbhr7TEtaehBb4ygFYyuiV8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeiF48t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJYPZpqgH9GR5CFn1FipGQ7KFbfqLkDWiZeTCuRoDMpf/WHM0gilYYJq3fPcxPgZVYYzgbNSP9WYUDahI+xZKmmE2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGNn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNiUbgrf68jppX1U9t+o1ryv12zyOIpzBOVyCBzWowz00oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8AyvGM6g==</latexit><latexit sha1_base64="b4HLEbhr7TEtaehBb4ygFYyuiV8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeiF48t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJYPZpqgH9GR5CFn1FipGQ7KFbfqLkDWiZeTCuRoDMpf/WHM0gilYYJq3fPcxPgZVYYzgbNSP9WYUDahI+xZKmmE2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGNn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNiUbgrf68jppX1U9t+o1ryv12zyOIpzBOVyCBzWowz00oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8AyvGM6g==</latexit><latexit sha1_base64="b4HLEbhr7TEtaehBb4ygFYyuiV8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeiF48t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJYPZpqgH9GR5CFn1FipGQ7KFbfqLkDWiZeTCuRoDMpf/WHM0gilYYJq3fPcxPgZVYYzgbNSP9WYUDahI+xZKmmE2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGNn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNiUbgrf68jppX1U9t+o1ryv12zyOIpzBOVyCBzWowz00oAUMEJ7hFd6cR+fFeXc+lq0FJ585hT9wPn8AyvGM6g==</latexit>

Page 6: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

What makes GANs more robust?Generator

Discriminator

random“noise”

z

x

min

Gmax

DV (G,D)

1. Model properties: symmetric transformation function

2. Training Algorithm: weight clipping (regularization) helps robustness

Generalization includes Wasserstein GANs!

Page 7: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Empirical Results• MNIST

D with a maximum absolute value of 0.1. Here, we apply the simple flipping adversary (2.2) with237

different error probabilities p. Figure 3 shows some typical results for one of the robust models and238

GAN, for p = 0 and p = 0.4. Indeed, as opposed to the GAN, the robust GAN reliably learns all239

modes, even with a fairly high p = 0.4. The figure illustrates an additional intuition: with higher240

noise p, learning indeed becomes more challenging, and the generator needs more iterations to learn.241

Figure 10 in the appendix confirms that this is generally the case.242

Clipping and its interaction with the model. To probe the effect of regularization, we next vary243

the clipping threshold. Figure 4 shows the success rate for different clipping thresholds, averaged244

over 10 runs. Here, a success is defined as correctly learning all the 8 modes (average number of245

modes are shown in the appendix). To better visualize the effect of clipping, D is intentionally made246

more powerful by having significantly more hidden neurons (4x more hidden neurons for each layer).247

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0

Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.2

Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.4

Gan Linear Tanh Erf Piece Piece2

SuccessRate

0

1

2

3

4

5

6

7

8

9

None 1 0.5 0.1 0.05 0.01

Error Probability = 0Gan Linear Tanh Erf Piece Piece2

0

1

2

3

4

5

6

7

8

9

None 1 0.5 0.1 0.05 0.01

Average # of Modes, Error = 0.2Gan Linear Tanh Erf Piece Piece2

0

1

2

3

4

5

6

7

8

9

None 1 0.5 0.1 0.05 0.01

Average # of Modes, Error = 0.4Gan Linear Tanh Erf Piece Piece2

Average#ofModes

Clipping

Error Probability = 0 Error Probability = 0.2 Error Probability = 0.4

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.2Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.4Gan Linear Tanh Erf Piece Piece2

Clipping

Figure 4: Success rates and average number of learned modes for various models and clipping values.

Figure 4 offers several observations: (1) The training algorithm indeed affects the results. A too248

powerful disriminator (small threshold) generally impairs learning [2, 24]; very small thresholds limit249

the capacity of D too much. (2) However, consistently, if a robust GAN can learn the distribution250

without noise (p = 0), then it also learns the distribution with noise, confirming its robustness. In251

general, the robust GANs work across a wider range of clipping thresholds, i.e., are less sensitive to252

parameter choice of clipping. These observations support our theoretical analysis in Sections 2 and 4.253

An interesting phenomenon to note is that in some cases, clipping may increase the empirical254

robustness of the standard GAN, although it is still more sensitive than the robust models (threshold255

0.05 and p = 0.4). This phenomenon is orthogonal to our theoretical results; here, the algorithm256

aids empirical stability. It points to the important role of training algorithms in practice, a line of257

research underlying recent state-of-the-art results [32, 16, 38, 24]. In the next section we will see that,258

while clipping may help robustness, the GAN is still more brittle also towards the choice of algorithm259

(amount of regularization). Extending the theory to include both aspects, model and algorithm, is an260

interesting avenue of future research.261

5.3 MNIST262

GAN

Piece_GAN

Error Probability = 0 Error Probability = 0.3 Error Probability = 0 Error Probability = 0.3

Figure 5: Example results on MNIST (no clipping).

Next, we perform a similar analysis with the MNIST data, using a CNN for both G and D. Without263

clipping, both the GAN and the robust GANs learn the distribution well, and generate all digits with264

the same probability. Here, we call a learning experiment a success if G learns to generate all digits265

with the same probability (see the appendix for a plot of the total variation distance between the266

distribution of the learned digits and the uniform distribution). To further explore our theoretical267

frameworks, we apply a more sophisticated adversary as follows: �(D(x)) equals 1 � D(x) or268p

D(x) or D(x)

2, each with probability 0.1, and otherwise, �(D(x)) = D(x) (i.e., honest feedback269

with probability 0.7). By Theorem 6, all the robust models we explore here should be robust against270

7

GAN

D with a maximum absolute value of 0.1. Here, we apply the simple flipping adversary (2.2) with237

different error probabilities p. Figure 3 shows some typical results for one of the robust models and238

GAN, for p = 0 and p = 0.4. Indeed, as opposed to the GAN, the robust GAN reliably learns all239

modes, even with a fairly high p = 0.4. The figure illustrates an additional intuition: with higher240

noise p, learning indeed becomes more challenging, and the generator needs more iterations to learn.241

Figure 10 in the appendix confirms that this is generally the case.242

Clipping and its interaction with the model. To probe the effect of regularization, we next vary243

the clipping threshold. Figure 4 shows the success rate for different clipping thresholds, averaged244

over 10 runs. Here, a success is defined as correctly learning all the 8 modes (average number of245

modes are shown in the appendix). To better visualize the effect of clipping, D is intentionally made246

more powerful by having significantly more hidden neurons (4x more hidden neurons for each layer).247

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0

Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.2

Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.4

Gan Linear Tanh Erf Piece Piece2

SuccessRate

0

1

2

3

4

5

6

7

8

9

None 1 0.5 0.1 0.05 0.01

Error Probability = 0Gan Linear Tanh Erf Piece Piece2

0

1

2

3

4

5

6

7

8

9

None 1 0.5 0.1 0.05 0.01

Average # of Modes, Error = 0.2Gan Linear Tanh Erf Piece Piece2

0

1

2

3

4

5

6

7

8

9

None 1 0.5 0.1 0.05 0.01

Average # of Modes, Error = 0.4Gan Linear Tanh Erf Piece Piece2

Average#ofModes

Clipping

Error Probability = 0 Error Probability = 0.2 Error Probability = 0.4

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.2Gan Linear Tanh Erf Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 1 0.5 0.1 0.05 0.01

Error Probability = 0.4Gan Linear Tanh Erf Piece Piece2

Clipping

Figure 4: Success rates and average number of learned modes for various models and clipping values.

Figure 4 offers several observations: (1) The training algorithm indeed affects the results. A too248

powerful disriminator (small threshold) generally impairs learning [2, 24]; very small thresholds limit249

the capacity of D too much. (2) However, consistently, if a robust GAN can learn the distribution250

without noise (p = 0), then it also learns the distribution with noise, confirming its robustness. In251

general, the robust GANs work across a wider range of clipping thresholds, i.e., are less sensitive to252

parameter choice of clipping. These observations support our theoretical analysis in Sections 2 and 4.253

An interesting phenomenon to note is that in some cases, clipping may increase the empirical254

robustness of the standard GAN, although it is still more sensitive than the robust models (threshold255

0.05 and p = 0.4). This phenomenon is orthogonal to our theoretical results; here, the algorithm256

aids empirical stability. It points to the important role of training algorithms in practice, a line of257

research underlying recent state-of-the-art results [32, 16, 38, 24]. In the next section we will see that,258

while clipping may help robustness, the GAN is still more brittle also towards the choice of algorithm259

(amount of regularization). Extending the theory to include both aspects, model and algorithm, is an260

interesting avenue of future research.261

5.3 MNIST262

GAN

Piece_GAN

Error Probability = 0 Error Probability = 0.3 Error Probability = 0 Error Probability = 0.3

Figure 5: Example results on MNIST (no clipping).

Next, we perform a similar analysis with the MNIST data, using a CNN for both G and D. Without263

clipping, both the GAN and the robust GANs learn the distribution well, and generate all digits with264

the same probability. Here, we call a learning experiment a success if G learns to generate all digits265

with the same probability (see the appendix for a plot of the total variation distance between the266

distribution of the learned digits and the uniform distribution). To further explore our theoretical267

frameworks, we apply a more sophisticated adversary as follows: �(D(x)) equals 1 � D(x) or268p

D(x) or D(x)

2, each with probability 0.1, and otherwise, �(D(x)) = D(x) (i.e., honest feedback269

with probability 0.7). By Theorem 6, all the robust models we explore here should be robust against270

7

stable GAN

Page 8: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Empirical Results

0

0.2

0.4

0.6

0.8

1

1.2

None 10 1 0.1 0.01 0.001

SuccessRate

Error Probability = 0GAN Linear Tanh Erf&Piece Piece2

0

0.2

0.4

0.6

0.8

1

1.2

None 10 1 0.1 0.01 0.001

Error Probability = 0.3GAN Linear Tanh Erf Piece Piece2

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

None 10 1 0.1 0.01 0.001

Error Probability = 0GAN Linear Tanh Erf Piece Piece2

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

None 10 1 0.1 0.01 0.001

Error Probability = 0.3

GAN Linear Tanh Erf Piece Piece2

TVDistancetothe

UniformDistribution

Clipping

Error probability = 0.3

Succ

ess

Rate

no clipping strict clipping

100% success

100% failure

GAN

stable GANs

• Weight Clipping (Regularization) helps robustness• Stable GANs are more robust to clipping threshold

Page 9: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Black-Box Optimization

f(x)

Goal: optimize a complex function

that is only accessible via expensive queries

2 Automatic Gait Optimization

Figure 1: The bio-inspired dynamicalbipedal walker Fox used for the exper-imental evaluation.

The search for appropriate parameters of a controller canbe formulated as an optimization problem, such as the min-imization

minimize✓2RD

f (✓) (1)

of an objective function f (·) with respect to the parame-ters ✓. In the context of gait optimization, this optimizationproblem is characterized as a global optimization of a zero-order stochastic objective function. Therefore, the use ofBayesian optimization well suits this challenging optimiza-tion task.

Bayesian Optimization Bayesian optimization is an iter-ative model-based global optimization method [7, 5, 11, 1].In Bayesian optimization, for each iteration (i.e., evalua-tion of the objective function f ), a GP model ✓ 7! f (✓)is learned from the data set T = {✓, f (✓)} composedby the past parameters ✓ and the corresponding measure-ments f (✓) of the objective function. This model is usedto predict the response surface ˆf and the corresponding ac-quisition surface ↵ (✓)1, once the response surface ˆf (·) is mapped through the acquisition func-tion ↵ (·). Using a global optimizer, the minimum ✓⇤ of the acquisition surface ↵ (✓) is computedwithout any evaluation of the objective function, e.g., no robot interaction. The current minimum ✓⇤

is evaluated on the robot and, together with the resulting measurement f (✓⇤), added to the datasetT.

Gaussian Process Model for Objective Function To create the model that maps ✓ 7! f(✓), wemake use of Bayesian non-parametric Gaussian Process regression [12]. Such a GP is a distributionover functions

f(✓) ⇠ GP (mf , k(✓p,✓q)) , (2)fully defined by a prior mean mf and a covariance function k(✓p,✓q). As prior mean, we choosemf ⌘ 0, while the chosen covariance function k(✓p,✓q) is the squared exponential with automaticrelevance determination and Gaussian noise

k(✓p,✓q) = �2f exp(�1

2 (✓p�✓q)T⇤�1

(✓p�✓q))+�2w�pq (3)

with ⇤ = diag([l21, ..., l2D]). Here, li are the characteristic length-scales, �2

f is the variance ofthe latent function f(·) and �2

w the noise variance. A practical issue, for both GP modeling andBayesian optimization, is the choice of the hyperparameters of the GP model, such as the charac-teristic length-scales li, the variance of the latent function �2

f and the noise variance �2w. In gait

optimization, these hyperparameters are often fixed a priori [9]. There are suggestions [8] that fixingthe hyperparameters can considerably speed up the convergence of Bayesian optimization. However,manually choosing the value of the hyperparameters requires extensive expert knowledge about thesystem that we want to optimize, which is often an unrealistic assumption. An alternative commonapproach is to automatically select the hyper-parameters by optimizing with respect to the marginallikelihood [12].

Acquisition Function A number of acquisition functions ↵ (·) exist, such as Probability of Im-provement [7], Expected Improvement [10], Upper Confidence Bound [3] and Entropy-Based Im-provements [4]. Experimental results [4] suggest that Expected Improvement on specific familiesof artificial functions performs better on average than Probability of Improvement and Upper Con-fidence Bound. However, these results do not necessarily hold true for real-world problems suchas gait optimization, where the objective functions are more complex to model. Both Probabilityof improvement [9] and Expected Improvement [14] have been previously employed in gait opti-mization. In our experiments, we evaluate Probability of Improvement, Expected Improvement andUpper Confidence Bound.

1The correct notation would be ↵�f̂ (✓)

�, but we use ↵ (✓) for notational convenience.

2

Page 10: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Black-Box OptimizationSequential Optimization: build internal model of f(x)

In each time step:

• Select a query point x

• Observe (noisy) f(x)

• update model

often: Gaussian ProcessGaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure: Examples include WiFi localization, C14 callibration curve.

select queries:uncertainty and expected function value, e.g. maximizing

ucb(x) = f̂(x) + �

1/2�(x)

<latexit sha1_base64="KmWnc+Dx1ykEeOMpOx11pR7+DNc=">AAACHXicbVBNSwMxFMzW7/pV9eglWARFqLuloBeh6MWjgrVCt5a3abYNTXaX5K1Ylv4RL/4VLx4U8eBF/DemtQdtHQhMZt4jmQkSKQy67peTm5mdm19YXMovr6yurRc2Nq9NnGrGayyWsb4JwHApIl5DgZLfJJqDCiSvB72zoV+/49qIOLrCfsKbCjqRCAUDtFKrUPEVYFerLGXBYO9+n55QvwuYhaPLAfUDjnCbeYflAfWN6CiweqtQdEvuCHSaeGNSJGNctAoffjtmqeIRMgnGNDw3wWYGGgWTfJD3U8MTYD3o8IalEShumtko3YDuWqVNw1jbEyEdqb83MlDG9FVgJ4dZzKQ3FP/zGimGx81MREmKPGI/D4WppBjTYVW0LTRnKPuWANPC/pWyLmhgaAvN2xK8ycjT5Lpc8iy/rBSrp+M6Fsk22SF7xCNHpErOyQWpEUYeyBN5Ia/Oo/PsvDnvP6M5Z7yzRf7A+fwGy62gcQ==</latexit><latexit sha1_base64="KmWnc+Dx1ykEeOMpOx11pR7+DNc=">AAACHXicbVBNSwMxFMzW7/pV9eglWARFqLuloBeh6MWjgrVCt5a3abYNTXaX5K1Ylv4RL/4VLx4U8eBF/DemtQdtHQhMZt4jmQkSKQy67peTm5mdm19YXMovr6yurRc2Nq9NnGrGayyWsb4JwHApIl5DgZLfJJqDCiSvB72zoV+/49qIOLrCfsKbCjqRCAUDtFKrUPEVYFerLGXBYO9+n55QvwuYhaPLAfUDjnCbeYflAfWN6CiweqtQdEvuCHSaeGNSJGNctAoffjtmqeIRMgnGNDw3wWYGGgWTfJD3U8MTYD3o8IalEShumtko3YDuWqVNw1jbEyEdqb83MlDG9FVgJ4dZzKQ3FP/zGimGx81MREmKPGI/D4WppBjTYVW0LTRnKPuWANPC/pWyLmhgaAvN2xK8ycjT5Lpc8iy/rBSrp+M6Fsk22SF7xCNHpErOyQWpEUYeyBN5Ia/Oo/PsvDnvP6M5Z7yzRf7A+fwGy62gcQ==</latexit><latexit sha1_base64="KmWnc+Dx1ykEeOMpOx11pR7+DNc=">AAACHXicbVBNSwMxFMzW7/pV9eglWARFqLuloBeh6MWjgrVCt5a3abYNTXaX5K1Ylv4RL/4VLx4U8eBF/DemtQdtHQhMZt4jmQkSKQy67peTm5mdm19YXMovr6yurRc2Nq9NnGrGayyWsb4JwHApIl5DgZLfJJqDCiSvB72zoV+/49qIOLrCfsKbCjqRCAUDtFKrUPEVYFerLGXBYO9+n55QvwuYhaPLAfUDjnCbeYflAfWN6CiweqtQdEvuCHSaeGNSJGNctAoffjtmqeIRMgnGNDw3wWYGGgWTfJD3U8MTYD3o8IalEShumtko3YDuWqVNw1jbEyEdqb83MlDG9FVgJ4dZzKQ3FP/zGimGx81MREmKPGI/D4WppBjTYVW0LTRnKPuWANPC/pWyLmhgaAvN2xK8ycjT5Lpc8iy/rBSrp+M6Fsk22SF7xCNHpErOyQWpEUYeyBN5Ia/Oo/PsvDnvP6M5Z7yzRf7A+fwGy62gcQ==</latexit><latexit sha1_base64="KmWnc+Dx1ykEeOMpOx11pR7+DNc=">AAACHXicbVBNSwMxFMzW7/pV9eglWARFqLuloBeh6MWjgrVCt5a3abYNTXaX5K1Ylv4RL/4VLx4U8eBF/DemtQdtHQhMZt4jmQkSKQy67peTm5mdm19YXMovr6yurRc2Nq9NnGrGayyWsb4JwHApIl5DgZLfJJqDCiSvB72zoV+/49qIOLrCfsKbCjqRCAUDtFKrUPEVYFerLGXBYO9+n55QvwuYhaPLAfUDjnCbeYflAfWN6CiweqtQdEvuCHSaeGNSJGNctAoffjtmqeIRMgnGNDw3wWYGGgWTfJD3U8MTYD3o8IalEShumtko3YDuWqVNw1jbEyEdqb83MlDG9FVgJ4dZzKQ3FP/zGimGx81MREmKPGI/D4WppBjTYVW0LTRnKPuWANPC/pWyLmhgaAvN2xK8ycjT5Lpc8iy/rBSrp+M6Fsk22SF7xCNHpErOyQWpEUYeyBN5Ia/Oo/PsvDnvP6M5Z7yzRf7A+fwGy62gcQ==</latexit>

Page 11: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Robust Optimization with GPsObserved f(x) may be (adversarially) perturbed:

• Robotics: simulations vs. real data

• Parameter tuning: estimation with limited data

• Time-varying functions

min�2�✏(x)

f(x+ �)

Δϵ(x) = {x′�− x : x′� ∈ D and d(x, x′�) ≤ ϵ}

maxx∈D

minδ∈Δϵ(x)

f(x + δ)Problem:

standard methods can fail!

suboptimal decision!

Page 12: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

O*( 1η2 (log 1

η )2p) ⌘<latexit sha1_base64="pS+PZ/UV6ooP4GI+75b54U/Pe1c=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3ocab9a8+v+XGQVggJqUKjZr371BgnLFNfIJLW2G/gphjk1KJjk00ovszylbEyHvOtQU8VtmM93nZIz5wxInBj3NJK5+3sip8raiYpcp6I4ssu1mflfrZthfB3mQqcZcs0WH8WZJJiQ2eFkIAxnKCcOKDPC7UrYiBrK0MVTcSEEyyevQvuiHji+v6w1boo4ynACp3AOAVxBA+6gCS1gMIJneIU3T3kv3rv3sWgtecXMMfyR9/kDCOSOOA==</latexit><latexit sha1_base64="pS+PZ/UV6ooP4GI+75b54U/Pe1c=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3ocab9a8+v+XGQVggJqUKjZr371BgnLFNfIJLW2G/gphjk1KJjk00ovszylbEyHvOtQU8VtmM93nZIz5wxInBj3NJK5+3sip8raiYpcp6I4ssu1mflfrZthfB3mQqcZcs0WH8WZJJiQ2eFkIAxnKCcOKDPC7UrYiBrK0MVTcSEEyyevQvuiHji+v6w1boo4ynACp3AOAVxBA+6gCS1gMIJneIU3T3kv3rv3sWgtecXMMfyR9/kDCOSOOA==</latexit><latexit sha1_base64="pS+PZ/UV6ooP4GI+75b54U/Pe1c=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3ocab9a8+v+XGQVggJqUKjZr371BgnLFNfIJLW2G/gphjk1KJjk00ovszylbEyHvOtQU8VtmM93nZIz5wxInBj3NJK5+3sip8raiYpcp6I4ssu1mflfrZthfB3mQqcZcs0WH8WZJJiQ2eFkIAxnKCcOKDPC7UrYiBrK0MVTcSEEyyevQvuiHji+v6w1boo4ynACp3AOAVxBA+6gCS1gMIJneIU3T3kv3rv3sWgtecXMMfyR9/kDCOSOOA==</latexit><latexit sha1_base64="pS+PZ/UV6ooP4GI+75b54U/Pe1c=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3ocab9a8+v+XGQVggJqUKjZr371BgnLFNfIJLW2G/gphjk1KJjk00ovszylbEyHvOtQU8VtmM93nZIz5wxInBj3NJK5+3sip8raiYpcp6I4ssu1mflfrZthfB3mQqcZcs0WH8WZJJiQ2eFkIAxnKCcOKDPC7UrYiBrK0MVTcSEEyyevQvuiHji+v6w1boo4ynACp3AOAVxBA+6gCS1gMIJneIU3T3kv3rv3sWgtecXMMfyR9/kDCOSOOA==</latexit>

StableOpt AlgorithmIn every round:

• select

• select

• observe and update with

Theorem (RBF kernel): sample complexity:

steps for -suboptimality whp.

• closely matching lower bound

• algorithm generalizes to many other settings!

x̃t = argmaxx∈D

minδ∈Δϵ(x)

ucbt−1(x + δ)

δt = argminδ∈Δϵ(x̃t)

lcbt−1(x̃t + δ)

yt = f(x̃t+δt) + zt {(x̃t+δt, yt)}

upper confidence bound

lower confidence bound

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Page 13: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

VariationsRobustness to unknown parameters

• is smooth wrt input and parameters

Robust estimation

• estimate of true parameters

Robust group identification

• input space partitioned into groups group with highest worst-case value

maxx∈D

minθ∈Θ

f(x, θ)✓

<latexit sha1_base64="+3IVlR1SceYdoobQHFCnv6mzqcQ=">AAAB7XicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBjxYvHCvYD2qVk07SNzSZLMiuUpf/BiwdFvPp/vPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wtr6xuVXcLu3s7u0flA+PmlanhvEG01KbdkQtl0LxBgqUvJ0YTuNI8lY0vp3VW0/cWKHVA04SHsZ0qMRAMIrOanZxxJH2yhW/6s9FViHIoQK56r3yV7evWRpzhUxSazuBn2CYUYOCST4tdVPLE8rGdMg7DhWNuQ2z+bZTcuacPhlo455CMnd/T2Q0tnYSR64zpjiyy7WZ+V+tk+LgOsyESlLkii0+GqSSoCaz00lfGM5QThxQZoTblbARNZShC6jkQgiWT16F5kU1cHx/Wand5HEU4QRO4RwCuIIa3EEdGsDgEZ7hFd487b14797HorXg5TPH8Efe5w+jrI8n</latexit><latexit sha1_base64="+3IVlR1SceYdoobQHFCnv6mzqcQ=">AAAB7XicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBjxYvHCvYD2qVk07SNzSZLMiuUpf/BiwdFvPp/vPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wtr6xuVXcLu3s7u0flA+PmlanhvEG01KbdkQtl0LxBgqUvJ0YTuNI8lY0vp3VW0/cWKHVA04SHsZ0qMRAMIrOanZxxJH2yhW/6s9FViHIoQK56r3yV7evWRpzhUxSazuBn2CYUYOCST4tdVPLE8rGdMg7DhWNuQ2z+bZTcuacPhlo455CMnd/T2Q0tnYSR64zpjiyy7WZ+V+tk+LgOsyESlLkii0+GqSSoCaz00lfGM5QThxQZoTblbARNZShC6jkQgiWT16F5kU1cHx/Wand5HEU4QRO4RwCuIIa3EEdGsDgEZ7hFd487b14797HorXg5TPH8Efe5w+jrI8n</latexit><latexit sha1_base64="+3IVlR1SceYdoobQHFCnv6mzqcQ=">AAAB7XicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBjxYvHCvYD2qVk07SNzSZLMiuUpf/BiwdFvPp/vPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wtr6xuVXcLu3s7u0flA+PmlanhvEG01KbdkQtl0LxBgqUvJ0YTuNI8lY0vp3VW0/cWKHVA04SHsZ0qMRAMIrOanZxxJH2yhW/6s9FViHIoQK56r3yV7evWRpzhUxSazuBn2CYUYOCST4tdVPLE8rGdMg7DhWNuQ2z+bZTcuacPhlo455CMnd/T2Q0tnYSR64zpjiyy7WZ+V+tk+LgOsyESlLkii0+GqSSoCaz00lfGM5QThxQZoTblbARNZShC6jkQgiWT16F5kU1cHx/Wand5HEU4QRO4RwCuIIa3EEdGsDgEZ7hFd487b14797HorXg5TPH8Efe5w+jrI8n</latexit><latexit sha1_base64="+3IVlR1SceYdoobQHFCnv6mzqcQ=">AAAB7XicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBjxYvHCvYD2qVk07SNzSZLMiuUpf/BiwdFvPp/vPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wtr6xuVXcLu3s7u0flA+PmlanhvEG01KbdkQtl0LxBgqUvJ0YTuNI8lY0vp3VW0/cWKHVA04SHsZ0qMRAMIrOanZxxJH2yhW/6s9FViHIoQK56r3yV7evWRpzhUxSazuBn2CYUYOCST4tdVPLE8rGdMg7DhWNuQ2z+bZTcuacPhlo455CMnd/T2Q0tnYSR64zpjiyy7WZ+V+tk+LgOsyESlLkii0+GqSSoCaz00lfGM5QThxQZoTblbARNZShC6jkQgiWT16F5kU1cHx/Wand5HEU4QRO4RwCuIIa3EEdGsDgEZ7hFd487b14797HorXg5TPH8Efe5w+jrI8n</latexit>

x

<latexit sha1_base64="v11J2ieZIwbhGlKoKPnDtKPEUro=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEfRY8eKxBfsBbSib7aRdu9mE3Y1YQn+BFw+KePUnefPfuG1z0NYXFh7emWFn3iARXBvX/XYKa+sbm1vF7dLO7t7+QfnwqKXjVDFssljEqhNQjYJLbBpuBHYShTQKBLaD8e2s3n5EpXks780kQT+iQ8lDzqixVuOpX664VXcusgpeDhXIVe+Xv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtShphNrP5otOyZl1BiSMlX3SkLn7eyKjkdaTKLCdETUjvVybmf/VuqkJr/2MyyQ1KNniozAVxMRkdjUZcIXMiIkFyhS3uxI2oooyY7Mp2RC85ZNXoXVR9Sw3Liu1mzyOIpzAKZyDB1dQgzuoQxMYIDzDK7w5D86L8+58LFoLTj5zDH/kfP4A5e6M+w==</latexit><latexit sha1_base64="v11J2ieZIwbhGlKoKPnDtKPEUro=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEfRY8eKxBfsBbSib7aRdu9mE3Y1YQn+BFw+KePUnefPfuG1z0NYXFh7emWFn3iARXBvX/XYKa+sbm1vF7dLO7t7+QfnwqKXjVDFssljEqhNQjYJLbBpuBHYShTQKBLaD8e2s3n5EpXks780kQT+iQ8lDzqixVuOpX664VXcusgpeDhXIVe+Xv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtShphNrP5otOyZl1BiSMlX3SkLn7eyKjkdaTKLCdETUjvVybmf/VuqkJr/2MyyQ1KNniozAVxMRkdjUZcIXMiIkFyhS3uxI2oooyY7Mp2RC85ZNXoXVR9Sw3Liu1mzyOIpzAKZyDB1dQgzuoQxMYIDzDK7w5D86L8+58LFoLTj5zDH/kfP4A5e6M+w==</latexit><latexit sha1_base64="v11J2ieZIwbhGlKoKPnDtKPEUro=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEfRY8eKxBfsBbSib7aRdu9mE3Y1YQn+BFw+KePUnefPfuG1z0NYXFh7emWFn3iARXBvX/XYKa+sbm1vF7dLO7t7+QfnwqKXjVDFssljEqhNQjYJLbBpuBHYShTQKBLaD8e2s3n5EpXks780kQT+iQ8lDzqixVuOpX664VXcusgpeDhXIVe+Xv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtShphNrP5otOyZl1BiSMlX3SkLn7eyKjkdaTKLCdETUjvVybmf/VuqkJr/2MyyQ1KNniozAVxMRkdjUZcIXMiIkFyhS3uxI2oooyY7Mp2RC85ZNXoXVR9Sw3Liu1mzyOIpzAKZyDB1dQgzuoQxMYIDzDK7w5D86L8+58LFoLTj5zDH/kfP4A5e6M+w==</latexit><latexit sha1_base64="v11J2ieZIwbhGlKoKPnDtKPEUro=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEfRY8eKxBfsBbSib7aRdu9mE3Y1YQn+BFw+KePUnefPfuG1z0NYXFh7emWFn3iARXBvX/XYKa+sbm1vF7dLO7t7+QfnwqKXjVDFssljEqhNQjYJLbBpuBHYShTQKBLaD8e2s3n5EpXks780kQT+iQ8lDzqixVuOpX664VXcusgpeDhXIVe+Xv3qDmKURSsME1brruYnxM6oMZwKnpV6qMaFsTIfYtShphNrP5otOyZl1BiSMlX3SkLn7eyKjkdaTKLCdETUjvVybmf/VuqkJr/2MyyQ1KNniozAVxMRkdjUZcIXMiIkFyhS3uxI2oooyY7Mp2RC85ZNXoXVR9Sw3Liu1mzyOIpzAKZyDB1dQgzuoQxMYIDzDK7w5D86L8+58LFoLTj5zDH/kfP4A5e6M+w==</latexit>

f : D × Θ → ℝ

✓̄<latexit sha1_base64="jBreMVOm7KmUp2dV9Uzn1wdlXn0=">AAAB83icbZBNS8NAEIYn9avWr6pHL8EieCqJCHqsePFYwdZCE8pku22XbjZhdyKU0L/hxYMiXv0z3vw3btsctPWFhYd3ZpjZN0qlMOR5305pbX1jc6u8XdnZ3ds/qB4etU2SacZbLJGJ7kRouBSKt0iQ5J1Uc4wjyR+j8e2s/vjEtRGJeqBJysMYh0oMBEOyVhBEqPOARpxw2qvWvLo3l7sKfgE1KNTsVb+CfsKymCtiEo3p+l5KYY6aBJN8Wgkyw1NkYxzyrkWFMTdhPr956p5Zp+8OEm2fInfu/p7IMTZmEke2M0YameXazPyv1s1ocB3mQqUZccUWiwaZdClxZwG4faE5IzmxgEwLe6vLRqiRkY2pYkPwl7+8Cu2Lum/5/rLWuCniKMMJnMI5+HAFDbiDJrSAQQrP8ApvTua8OO/Ox6K15BQzx/BHzucPbd2R7A==</latexit><latexit sha1_base64="jBreMVOm7KmUp2dV9Uzn1wdlXn0=">AAAB83icbZBNS8NAEIYn9avWr6pHL8EieCqJCHqsePFYwdZCE8pku22XbjZhdyKU0L/hxYMiXv0z3vw3btsctPWFhYd3ZpjZN0qlMOR5305pbX1jc6u8XdnZ3ds/qB4etU2SacZbLJGJ7kRouBSKt0iQ5J1Uc4wjyR+j8e2s/vjEtRGJeqBJysMYh0oMBEOyVhBEqPOARpxw2qvWvLo3l7sKfgE1KNTsVb+CfsKymCtiEo3p+l5KYY6aBJN8Wgkyw1NkYxzyrkWFMTdhPr956p5Zp+8OEm2fInfu/p7IMTZmEke2M0YameXazPyv1s1ocB3mQqUZccUWiwaZdClxZwG4faE5IzmxgEwLe6vLRqiRkY2pYkPwl7+8Cu2Lum/5/rLWuCniKMMJnMI5+HAFDbiDJrSAQQrP8ApvTua8OO/Ox6K15BQzx/BHzucPbd2R7A==</latexit><latexit sha1_base64="jBreMVOm7KmUp2dV9Uzn1wdlXn0=">AAAB83icbZBNS8NAEIYn9avWr6pHL8EieCqJCHqsePFYwdZCE8pku22XbjZhdyKU0L/hxYMiXv0z3vw3btsctPWFhYd3ZpjZN0qlMOR5305pbX1jc6u8XdnZ3ds/qB4etU2SacZbLJGJ7kRouBSKt0iQ5J1Uc4wjyR+j8e2s/vjEtRGJeqBJysMYh0oMBEOyVhBEqPOARpxw2qvWvLo3l7sKfgE1KNTsVb+CfsKymCtiEo3p+l5KYY6aBJN8Wgkyw1NkYxzyrkWFMTdhPr956p5Zp+8OEm2fInfu/p7IMTZmEke2M0YameXazPyv1s1ocB3mQqUZccUWiwaZdClxZwG4faE5IzmxgEwLe6vLRqiRkY2pYkPwl7+8Cu2Lum/5/rLWuCniKMMJnMI5+HAFDbiDJrSAQQrP8ApvTua8OO/Ox6K15BQzx/BHzucPbd2R7A==</latexit><latexit sha1_base64="jBreMVOm7KmUp2dV9Uzn1wdlXn0=">AAAB83icbZBNS8NAEIYn9avWr6pHL8EieCqJCHqsePFYwdZCE8pku22XbjZhdyKU0L/hxYMiXv0z3vw3btsctPWFhYd3ZpjZN0qlMOR5305pbX1jc6u8XdnZ3ds/qB4etU2SacZbLJGJ7kRouBSKt0iQ5J1Uc4wjyR+j8e2s/vjEtRGJeqBJysMYh0oMBEOyVhBEqPOARpxw2qvWvLo3l7sKfgE1KNTsVb+CfsKymCtiEo3p+l5KYY6aBJN8Wgkyw1NkYxzyrkWFMTdhPr956p5Zp+8OEm2fInfu/p7IMTZmEke2M0YameXazPyv1s1ocB3mQqUZccUWiwaZdClxZwG4faE5IzmxgEwLe6vLRqiRkY2pYkPwl7+8Cu2Lum/5/rLWuCniKMMJnMI5+HAFDbiDJrSAQQrP8ApvTua8OO/Ox6K15BQzx/BHzucPbd2R7A==</latexit>

✓⇤ 2 ⇥<latexit sha1_base64="cto74bMvj9LZ8Cp+ym/6Bd9imSQ=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiiIuSiKDLihuXFXqDJpbJdNoOnUzCzIkQQn0VNy4UceuDuPNtnLRZaOsPA9/85xzmzB/EgmtwnG+rtLa+sblV3q7s7O7tH9iHRx0dJYqyNo1EpHoB0UxwydrAQbBerBgJA8G6wfQ2r3cfmdI8ki1IY+aHZCz5iFMCxhrYVQ8mDMjDOfa4xF4rvwzsmlN35sKr4BZQQ4WaA/vLG0Y0CZkEKojWfdeJwc+IAk4Fm1W8RLOY0CkZs75BSUKm/Wy+/AyfGmeIR5EyRwKeu78nMhJqnYaB6QwJTPRyLTf/q/UTGF37GZdxAkzSxUOjRGCIcJ4EHnLFKIjUAKGKm10xnRBFKJi8KiYEd/nLq9C5qLuG7y9rjZsijjI6RifoDLnoCjXQHWqiNqIoRc/oFb1ZT9aL9W59LFpLVjFTRX9kff4AuOKUJw==</latexit><latexit sha1_base64="cto74bMvj9LZ8Cp+ym/6Bd9imSQ=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiiIuSiKDLihuXFXqDJpbJdNoOnUzCzIkQQn0VNy4UceuDuPNtnLRZaOsPA9/85xzmzB/EgmtwnG+rtLa+sblV3q7s7O7tH9iHRx0dJYqyNo1EpHoB0UxwydrAQbBerBgJA8G6wfQ2r3cfmdI8ki1IY+aHZCz5iFMCxhrYVQ8mDMjDOfa4xF4rvwzsmlN35sKr4BZQQ4WaA/vLG0Y0CZkEKojWfdeJwc+IAk4Fm1W8RLOY0CkZs75BSUKm/Wy+/AyfGmeIR5EyRwKeu78nMhJqnYaB6QwJTPRyLTf/q/UTGF37GZdxAkzSxUOjRGCIcJ4EHnLFKIjUAKGKm10xnRBFKJi8KiYEd/nLq9C5qLuG7y9rjZsijjI6RifoDLnoCjXQHWqiNqIoRc/oFb1ZT9aL9W59LFpLVjFTRX9kff4AuOKUJw==</latexit><latexit sha1_base64="cto74bMvj9LZ8Cp+ym/6Bd9imSQ=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiiIuSiKDLihuXFXqDJpbJdNoOnUzCzIkQQn0VNy4UceuDuPNtnLRZaOsPA9/85xzmzB/EgmtwnG+rtLa+sblV3q7s7O7tH9iHRx0dJYqyNo1EpHoB0UxwydrAQbBerBgJA8G6wfQ2r3cfmdI8ki1IY+aHZCz5iFMCxhrYVQ8mDMjDOfa4xF4rvwzsmlN35sKr4BZQQ4WaA/vLG0Y0CZkEKojWfdeJwc+IAk4Fm1W8RLOY0CkZs75BSUKm/Wy+/AyfGmeIR5EyRwKeu78nMhJqnYaB6QwJTPRyLTf/q/UTGF37GZdxAkzSxUOjRGCIcJ4EHnLFKIjUAKGKm10xnRBFKJi8KiYEd/nLq9C5qLuG7y9rjZsijjI6RifoDLnoCjXQHWqiNqIoRc/oFb1ZT9aL9W59LFpLVjFTRX9kff4AuOKUJw==</latexit><latexit sha1_base64="cto74bMvj9LZ8Cp+ym/6Bd9imSQ=">AAAB/HicbZDLSsNAFIYn9VbrLdqlm8EiiIuSiKDLihuXFXqDJpbJdNoOnUzCzIkQQn0VNy4UceuDuPNtnLRZaOsPA9/85xzmzB/EgmtwnG+rtLa+sblV3q7s7O7tH9iHRx0dJYqyNo1EpHoB0UxwydrAQbBerBgJA8G6wfQ2r3cfmdI8ki1IY+aHZCz5iFMCxhrYVQ8mDMjDOfa4xF4rvwzsmlN35sKr4BZQQ4WaA/vLG0Y0CZkEKojWfdeJwc+IAk4Fm1W8RLOY0CkZs75BSUKm/Wy+/AyfGmeIR5EyRwKeu78nMhJqnYaB6QwJTPRyLTf/q/UTGF37GZdxAkzSxUOjRGCIcJ4EHnLFKIjUAKGKm10xnRBFKJi8KiYEd/nLq9C5qLuG7y9rjZsijjI6RifoDLnoCjXQHWqiNqIoRc/oFb1ZT9aL9W59LFpLVjFTRX9kff4AuOKUJw==</latexit>

maxx∈D

minδθ∈Δϵ(θ̄)

f(x, θ̄ + δθ)

! = {G1, …, Gk}maxG∈"

minx∈G

f(x)

Page 14: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Empirical Results

0 20 40 60 80 100t

0

5

10

15

20

25

‘-re

gret

StableOptGP-UCBMaxiMin-GP-UCBStable-GP-UCBStable-GP-Random

b e t t e r

0 20 40 60 80 100t

≠2

0

2

4

Avg

.M

in.

Obj

.V

al.

GP-UCBMaxiMin-GP-UCBStable-GP-UCBStable-GP-RandomStableOpt

b e t t e r

StableOpt

0 1 2 3x

0

1

2

3

4

y

≠60≠50≠40≠30≠20≠1001020

0 1 2 3x

0

1

2

3

4

y

≠60

≠50

≠40

≠30

≠20

≠10

[Bertsimas et al.’10]

Robot Pushing

Benchmark

Page 15: Robustness in GANs and in Black-box Optimization · demonstrate much more diversity in generated samples compared to DCGAN. 20 G(z) random “noise” z real data x D(x),D(G(z)) •

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Robustness in ML

Robustness in GANs (Z. Xu, C. Li, S. Jegelka, arXiv)

Representational Power in Deep Learning

Robust Black-Box Optimization (I. Bogunovic, J. Scarlett, S. Jegelka, V.

Cevher NIPS 2018)

One unit is enough!

Robust Optimization, Generalization,

Discrete & Nonconvex Optimization

Generator

Critic

“noise”