invariance and stability to deformations of deep...

81
Invariance and Stability to Deformations of Deep Convolutional Representations Julien Mairal Inria Grenoble ML and AI workshop, Telecom ParisTech, 2018 Julien Mairal Invariance and stability of DL 1/35

Upload: others

Post on 24-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Invariance and Stability to Deformationsof Deep Convolutional Representations

Julien Mairal

Inria Grenoble

ML and AI workshop, Telecom ParisTech, 2018

Julien Mairal Invariance and stability of DL 1/35

Page 2: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

This is mostly the work of Alberto Bietti

A. Bietti and J. Mairal. Group Invariance, Stability toDeformations, and Complexity of Deep ConvolutionalRepresentations. arXiv:1706.03078. 2018.

A. Bietti and J. Mairal. Invariance and Stability of DeepConvolutional Representations. NIPS. 2017.

Julien Mairal Invariance and stability of DL 2/35

Page 3: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Learning a predictive model

The goal is to learn a prediction function f : Rp → R given labeledtraining data (xi, yi)i=1,...,n with xi in Rp, and yi in R:

minf∈F

1

n

n∑

i=1

L(yi, f(xi))

︸ ︷︷ ︸empirical risk, data fit

+ λΩ(f)︸ ︷︷ ︸regularization

.

Julien Mairal Invariance and stability of DL 3/35

Page 4: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Objectives

Deep convolutional signal representations

Are they stable to deformations?

How can we achieve invariance to transformation groups?

Do they preserve signal information?

Learning aspects

Building a functional space for CNNs (or similar objects).

Deriving a measure of model complexity.Paradigm 3: Deep Kernel Machines

A quick zoom on convolutional neural networks

still involves the ERM problem

minf2F

1

n

nX

i=1

L(yi, f(xi))

| z empirical risk, data fit

+ (f)| z regularization

.

[LeCun et al., 1989, 1998, Ciresan et al., 2012, Krizhevsky et al., 2012]...

Julien Mairal Soutenance HdR 8/33

Julien Mairal Invariance and stability of DL 4/35

Page 5: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A kernel perspective

minf∈H

1

n

n∑

i=1

L(yi, f(xi)) + λ‖f‖2H.

map data to a Hilbert space (RKHS) and work with linear forms:

Φ : X → H and f(x) = 〈Φ(x), f〉H.

xx x x

xx xx-

6

*

..............

............

I

QQs

...................................................................................................................................................................................................................................................................................................................................... .......................................

..................................

.........................................................................

.....

.....

.......

......

.......

.......

......

......

......

.......

......

......

.....

.....

......

......

.......

..........................................................

...........................

..........................................................................................................................................................................................

...............................................................................................

................................................................................. ..................................................................................................................................................................................z

.................................................................................................................................................................................................................. ....... .......................................................................................

..........................................................................:

.................................................................................................................................................................. ...... ....... ......................................................................................

...............................................................................................

....:...........................

...................................................

..............................................................

......................... .....................................................................................................................................z

.................................................................................

..........................................................................................

.......... ....... .............................................................................................................................................................................................................................................................z

ϕ

X H

Julien Mairal Invariance and stability of DL 5/35

Page 6: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A kernel perspective

minf∈H

1

n

n∑

i=1

L(yi, f(xi)) + λ‖f‖2H.

Main purpose: embed data in a vectorial space where

many geometrical operations exist (angle computation, projectionon linear subspaces, definition of barycenters....).

one may learn potentially rich infinite-dimensional models.

regularization is natural:

|f(x)− f(x′)| ≤ ‖f‖H‖Φ(x)− Φ(x′)‖H.

Julien Mairal Invariance and stability of DL 6/35

Page 7: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A kernel perspective

Second purpose: unhappy with the current Euclidean structure?

lift data to a higher-dimensional space with nicer properties (e.g.,linear separability, clustering structure).

then, the linear form f(x) = 〈Φ(x), f〉H in H may correspond to anon-linear model in X .

2R

x1

x2

x1

x2

2

Julien Mairal Invariance and stability of DL 7/35

Page 8: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A kernel perspective

Recipe

Map data x to high-dimensional space, Φ(x) in H (RKHS), withHilbertian geometry (projections, barycenters, angles, . . . , exist!).

predictive models f in H are linear forms in H: f(x) = 〈f,Φ(x)〉H.

Learning with a positive definite kernel K(x, x′) = 〈Φ(x),Φ(x′)〉H.

[Scholkopf and Smola, 2002, Shawe-Taylor and Cristianini, 2004]...

Julien Mairal Invariance and stability of DL 8/35

Page 9: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A kernel perspective

Recipe

Map data x to high-dimensional space, Φ(x) in H (RKHS), withHilbertian geometry (projections, barycenters, angles, . . . , exist!).

predictive models f in H are linear forms in H: f(x) = 〈f,Φ(x)〉H.

Learning with a positive definite kernel K(x, x′) = 〈Φ(x),Φ(x′)〉H.

What is the relation with deep neural networks?

It is possible to design a RKHS H where a large class of deep neuralnetworks live [Mairal, 2016].

f(x) = σk(Wkσk–1(Wk–1 . . . σ2(W2σ1(W1x)) . . .)) = 〈f,Φ(x)〉H.

This is the construction of “convolutional kernel networks”.

[Scholkopf and Smola, 2002, Shawe-Taylor and Cristianini, 2004]...

Julien Mairal Invariance and stability of DL 8/35

Page 10: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A kernel perspective

Recipe

Map data x to high-dimensional space, Φ(x) in H (RKHS), withHilbertian geometry (projections, barycenters, angles, . . . , exist!).

predictive models f in H are linear forms in H: f(x) = 〈f,Φ(x)〉H.

Learning with a positive definite kernel K(x, x′) = 〈Φ(x),Φ(x′)〉H.

Why do we care?

Φ(x) is related to the network architecture and is independentof training data. Is it stable? Does it lose signal information?

f is a predictive model. Can we control its stability?

|f(x)− f(x′)| ≤ ‖f‖H‖Φ(x)− Φ(x′)‖H.

‖f‖H controls both stability and generalization!

Julien Mairal Invariance and stability of DL 8/35

Page 11: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A signal processing perspectiveplus a bit of harmonic analysis

Consider images defined on a continuous domain Ω = Rd.

τ : Ω→ Ω: C1-diffeomorphism.

Lτx(u) = x(u− τ(u)): action operator.

Much richer group of transformations than translations.

Invariance to TranslationsTwo dimensional group: R2

Translations and Deformations

• Patterns are translated and deformed

[Mallat, 2012, Allassonniere, Amit, and Trouve, 2007, Trouve and Younes, 2005]...

Julien Mairal Invariance and stability of DL 9/35

Page 12: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A signal processing perspectiveplus a bit of harmonic analysis

Consider images defined on a continuous domain Ω = Rd.

τ : Ω→ Ω: C1-diffeomorphism.

Lτx(u) = x(u− τ(u)): action operator.

Much richer group of transformations than translations.

Invariance to TranslationsTwo dimensional group: R2

Translations and Deformations

• Patterns are translated and deformed

Relation with deep convolutional representations

Stability to deformations studied for wavelet-based scattering transform.

[Mallat, 2012, Bruna and Mallat, 2013, Sifre and Mallat, 2013]...

Julien Mairal Invariance and stability of DL 9/35

Page 13: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A signal processing perspectiveplus a bit of harmonic analysis

Consider images defined on a continuous domain Ω = Rd.

τ : Ω→ Ω: C1-diffeomorphism.

Lτx(u) = x(u− τ(u)): action operator.

Much richer group of transformations than translations.

Definition of stability

Representation Φ(·) is stable [Mallat, 2012] if:

‖Φ(Lτx)− Φ(x)‖ ≤ (C1‖∇τ‖∞ + C2‖τ‖∞)‖x‖.

‖∇τ‖∞ = supu ‖∇τ(u)‖ controls deformation.

‖τ‖∞ = supu |τ(u)| controls translation.

C2 → 0: translation invariance.

Julien Mairal Invariance and stability of DL 9/35

Page 14: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Summary of our results

Multi-layer construction of the RKHS HContains CNNs with smooth homogeneous activations functions.

Signal representation

Signal preservation of the multi-layer kernel mapping Φ.

Conditions of non-trivial stability for Φ.

Constructions to achieve group invariance.

On learning

Bounds on the RKHS norm ‖.‖H to control stability andgeneralization of a predictive model f .

|f(x)− f(x′)| ≤ ‖f‖H‖Φ(x)− Φ(x′)‖H.

Julien Mairal Invariance and stability of DL 10/35

Page 15: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Summary of our results

Multi-layer construction of the RKHS HContains CNNs with smooth homogeneous activations functions.

Signal representation

Signal preservation of the multi-layer kernel mapping Φ.

Conditions of non-trivial stability for Φ.

Constructions to achieve group invariance.

On learning

Bounds on the RKHS norm ‖.‖H to control stability andgeneralization of a predictive model f .

|f(x)− f(x′)| ≤ ‖f‖H‖Φ(x)− Φ(x′)‖H.

Julien Mairal Invariance and stability of DL 10/35

Page 16: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Summary of our results

Multi-layer construction of the RKHS HContains CNNs with smooth homogeneous activations functions.

Signal representation

Signal preservation of the multi-layer kernel mapping Φ.

Conditions of non-trivial stability for Φ.

Constructions to achieve group invariance.

On learning

Bounds on the RKHS norm ‖.‖H to control stability andgeneralization of a predictive model f .

|f(x)− f(x′)| ≤ ‖f‖H‖Φ(x)− Φ(x′)‖H.

Julien Mairal Invariance and stability of DL 10/35

Page 17: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Outline

1 Construction of the multi-layer convolutional representation

2 Invariance and stability

3 Learning aspects: model complexity

Julien Mairal Invariance and stability of DL 11/35

Page 18: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A generic deep convolutional representation

Initial map x0 in L2(Ω,H0)

x0 : Ω→ H0: continuous input signal

u ∈ Ω = Rd: location (d = 2 for images).

x0(u) ∈ H0: input value at location u (H0 = R3 for RGB images).

Building map xk in L2(Ω,Hk) from xk−1 in L2(Ω,Hk−1)

xk : Ω→ Hk: feature map at layer k

xk = AkMkPkxk−1.

Pk: patch extraction operator, extract small patch of feature mapxk−1 around each point u (Pkxk−1(u) is a patch centered at u).

Mk: non-linear mapping operator, maps each patch to a newHilbert space Hk with a pointwise non-linear function ϕk(·).

Ak: (linear) pooling operator at scale σk.

Julien Mairal Invariance and stability of DL 12/35

Page 19: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A generic deep convolutional representation

Initial map x0 in L2(Ω,H0)

x0 : Ω→ H0: continuous input signal

u ∈ Ω = Rd: location (d = 2 for images).

x0(u) ∈ H0: input value at location u (H0 = R3 for RGB images).

Building map xk in L2(Ω,Hk) from xk−1 in L2(Ω,Hk−1)

xk : Ω→ Hk: feature map at layer k

xk = AkMk

Pkxk−1.

Pk: patch extraction operator, extract small patch of feature mapxk−1 around each point u (Pkxk−1(u) is a patch centered at u).

Mk: non-linear mapping operator, maps each patch to a newHilbert space Hk with a pointwise non-linear function ϕk(·).

Ak: (linear) pooling operator at scale σk.

Julien Mairal Invariance and stability of DL 12/35

Page 20: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A generic deep convolutional representation

Initial map x0 in L2(Ω,H0)

x0 : Ω→ H0: continuous input signal

u ∈ Ω = Rd: location (d = 2 for images).

x0(u) ∈ H0: input value at location u (H0 = R3 for RGB images).

Building map xk in L2(Ω,Hk) from xk−1 in L2(Ω,Hk−1)

xk : Ω→ Hk: feature map at layer k

xk = Ak

MkPkxk−1.

Pk: patch extraction operator, extract small patch of feature mapxk−1 around each point u (Pkxk−1(u) is a patch centered at u).

Mk: non-linear mapping operator, maps each patch to a newHilbert space Hk with a pointwise non-linear function ϕk(·).

Ak: (linear) pooling operator at scale σk.

Julien Mairal Invariance and stability of DL 12/35

Page 21: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A generic deep convolutional representation

Initial map x0 in L2(Ω,H0)

x0 : Ω→ H0: continuous input signal

u ∈ Ω = Rd: location (d = 2 for images).

x0(u) ∈ H0: input value at location u (H0 = R3 for RGB images).

Building map xk in L2(Ω,Hk) from xk−1 in L2(Ω,Hk−1)

xk : Ω→ Hk: feature map at layer k

xk = AkMkPkxk−1.

Pk: patch extraction operator, extract small patch of feature mapxk−1 around each point u (Pkxk−1(u) is a patch centered at u).

Mk: non-linear mapping operator, maps each patch to a newHilbert space Hk with a pointwise non-linear function ϕk(·).

Ak: (linear) pooling operator at scale σk.

Julien Mairal Invariance and stability of DL 12/35

Page 22: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

A generic deep convolutional representation

xk–1 : Ω → Hk–1xk–1(u) ∈ Hk–1

Pkxk–1(v) ∈ Pk (patch extraction)

kernel mapping

MkPkxk–1(v) = ϕk(Pkxk–1(v)) ∈ HkMkPkxk–1 : Ω → Hk

xk :=AkMkPkxk–1 : Ω → Hk

linear poolingxk(w) = AkMkPkxk–1(w) ∈ Hk

Julien Mairal Invariance and stability of DL 13/35

Page 23: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Patch extraction operator Pk

Pkxk–1(u) := (v ∈ Sk 7→ xk–1(u+ v)) ∈ Pk = HSkk–1.

xk–1 : Ω → Hk–1xk–1(u) ∈ Hk–1

Pkxk–1(v) ∈ Pk (patch extraction)

Sk: patch shape, e.g. box.

Pk is linear, and preserves the norm: ‖Pkxk–1‖ = ‖xk–1‖.Norm of a map: ‖x‖2 =

∫Ω ‖x(u)‖2du <∞ for x in L2(Ω,H).

Julien Mairal Invariance and stability of DL 14/35

Page 24: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Non-linear pointwise mapping operator Mk

MkPkxk–1(u) := ϕk(Pkxk–1(u)) ∈ Hk.

xk–1 : Ω → Hk–1

Pkxk–1(v) ∈ Pk

non-linear mapping

MkPkxk–1(v) = ϕk(Pkxk–1(v)) ∈ HkMkPkxk–1 : Ω → Hk

Julien Mairal Invariance and stability of DL 15/35

Page 25: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Non-linear pointwise mapping operator Mk

MkPkxk–1(u) := ϕk(Pkxk–1(u)) ∈ Hk.

ϕk : Pk → Hk pointwise non-linearity on patches.

We assume non-expansivity

‖ϕk(z)‖ ≤ ‖z‖ and ‖ϕk(z)− ϕk(z′)‖ ≤ ‖z − z′‖.

Mk then satisfies, for x, x′ ∈ L2(Ω,Pk)

‖Mkx‖ ≤ ‖x‖ and ‖Mkx−Mkx′‖ ≤ ‖x− x′‖.

Julien Mairal Invariance and stability of DL 16/35

Page 26: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

ϕk from kernels

Kernel mapping of homogeneous dot-product kernels:

Kk(z, z′) = ‖z‖‖z′‖κk

( 〈z, z′〉‖z‖‖z′‖

)= 〈ϕk(z), ϕk(z′)〉.

κk(u) =∑∞

j=0 bjuj with bj ≥ 0, κk(1) = 1.

‖ϕk(z)‖ = Kk(z, z)1/2 = ‖z‖ (norm preservation).

‖ϕk(z)− ϕk(z′)‖ ≤ ‖z − z′‖ if κ′k(1) ≤ 1 (non-expansiveness).

Examples

κexp(〈z, z′〉) = e〈z,z′〉−1 = e−

12‖z−z′‖2 (if ‖z‖ = ‖z′‖ = 1).

κinv-poly(〈z, z′〉) = 12−〈z,z′〉 .

[Schoenberg, 1942, Scholkopf, 1997, Smola et al., 2001, Cho and Saul, 2010, Zhang

et al., 2016, 2017, Daniely et al., 2016, Bach, 2017, Mairal, 2016]...

Julien Mairal Invariance and stability of DL 17/35

Page 27: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

ϕk from kernels

Kernel mapping of homogeneous dot-product kernels:

Kk(z, z′) = ‖z‖‖z′‖κk

( 〈z, z′〉‖z‖‖z′‖

)= 〈ϕk(z), ϕk(z′)〉.

κk(u) =∑∞

j=0 bjuj with bj ≥ 0, κk(1) = 1.

‖ϕk(z)‖ = Kk(z, z)1/2 = ‖z‖ (norm preservation).

‖ϕk(z)− ϕk(z′)‖ ≤ ‖z − z′‖ if κ′k(1) ≤ 1 (non-expansiveness).

Examples

κexp(〈z, z′〉) = e〈z,z′〉−1 = e−

12‖z−z′‖2 (if ‖z‖ = ‖z′‖ = 1).

κinv-poly(〈z, z′〉) = 12−〈z,z′〉 .

[Schoenberg, 1942, Scholkopf, 1997, Smola et al., 2001, Cho and Saul, 2010, Zhang

et al., 2016, 2017, Daniely et al., 2016, Bach, 2017, Mairal, 2016]...

Julien Mairal Invariance and stability of DL 17/35

Page 28: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Pooling operator Ak

xk(u) = AkMkPkxk–1(u) =

Rd

hσk(u− v)MkPkxk–1(v)dv ∈ Hk.

xk–1 : Ω → Hk–1

MkPkxk–1 : Ω → Hk

xk := AkMkPkxk–1 : Ω → Hk

linear poolingxk(w) = AkMkPkxk–1(w) ∈ Hk

Julien Mairal Invariance and stability of DL 18/35

Page 29: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Pooling operator Ak

xk(u) = AkMkPkxk–1(u) =

Rd

hσk(u− v)MkPkxk–1(v)dv ∈ Hk.

hσk : pooling filter at scale σk.

hσk(u) := σ−dk h(u/σk) with h(u) Gaussian.

linear, non-expansive operator: ‖Ak‖ ≤ 1 (operator norm).

Julien Mairal Invariance and stability of DL 18/35

Page 30: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Recap: Pk, Mk, Ak

xk–1 : Ω → Hk–1xk–1(u) ∈ Hk–1

Pkxk–1(v) ∈ Pk (patch extraction)

kernel mapping

MkPkxk–1(v) = ϕk(Pkxk–1(v)) ∈ HkMkPkxk–1 : Ω → Hk

xk :=AkMkPkxk–1 : Ω → Hk

linear poolingxk(w) = AkMkPkxk–1(w) ∈ Hk

Julien Mairal Invariance and stability of DL 19/35

Page 31: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Multilayer construction

Assumption on x0

x0 is typically a discrete signal aquired with physical device.

Natural assumption: x0 = A0x, with x the original continuoussignal, A0 local integrator with scale σ0 (anti-aliasing).

Multilayer representation

Φn(x) = AnMnPnAn−1Mn−1Pn−1 · · · A1M1P1x0 ∈ L2(Ω,Hn).

Sk, σk grow exponentially in practice (i.e., fixed with subsampling).

Prediction layer

e.g., linear f(x) = 〈w,Φn(x)〉.“linear kernel” K(x, x′) = 〈Φn(x),Φn(x′)〉 =

∫Ω〈xn(u), x′n(u)〉du.

Julien Mairal Invariance and stability of DL 20/35

Page 32: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Multilayer construction

Assumption on x0

x0 is typically a discrete signal aquired with physical device.

Natural assumption: x0 = A0x, with x the original continuoussignal, A0 local integrator with scale σ0 (anti-aliasing).

Multilayer representation

Φn(x) = AnMnPnAn−1Mn−1Pn−1 · · · A1M1P1x0 ∈ L2(Ω,Hn).

Sk, σk grow exponentially in practice (i.e., fixed with subsampling).

Prediction layer

e.g., linear f(x) = 〈w,Φn(x)〉.“linear kernel” K(x, x′) = 〈Φn(x),Φn(x′)〉 =

∫Ω〈xn(u), x′n(u)〉du.

Julien Mairal Invariance and stability of DL 20/35

Page 33: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Multilayer construction

Assumption on x0

x0 is typically a discrete signal aquired with physical device.

Natural assumption: x0 = A0x, with x the original continuoussignal, A0 local integrator with scale σ0 (anti-aliasing).

Multilayer representation

Φn(x) = AnMnPnAn−1Mn−1Pn−1 · · · A1M1P1x0 ∈ L2(Ω,Hn).

Sk, σk grow exponentially in practice (i.e., fixed with subsampling).

Prediction layer

e.g., linear f(x) = 〈w,Φn(x)〉.“linear kernel” K(x, x′) = 〈Φn(x),Φn(x′)〉 =

∫Ω〈xn(u), x′n(u)〉du.

Julien Mairal Invariance and stability of DL 20/35

Page 34: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Outline

1 Construction of the multi-layer convolutional representation

2 Invariance and stability

3 Learning aspects: model complexity

Julien Mairal Invariance and stability of DL 21/35

Page 35: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Invariance, definitions

τ : Ω→ Ω: C1-diffeomorphism with Ω = Rd.

Lτx(u) = x(u− τ(u)): action operator.

Much richer group of transformations than translations.

Invariance to TranslationsTwo dimensional group: R2

Translations and Deformations

• Patterns are translated and deformed

[Mallat, 2012, Bruna and Mallat, 2013, Sifre and Mallat, 2013]...

Julien Mairal Invariance and stability of DL 22/35

Page 36: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Invariance, definitions

τ : Ω→ Ω: C1-diffeomorphism with Ω = Rd.

Lτx(u) = x(u− τ(u)): action operator.

Much richer group of transformations than translations.

Definition of stability

Representation Φ(·) is stable [Mallat, 2012] if:

‖Φ(Lτx)− Φ(x)‖ ≤ (C1‖∇τ‖∞ + C2‖τ‖∞)‖x‖.

‖∇τ‖∞ = supu ‖∇τ(u)‖ controls deformation.

‖τ‖∞ = supu |τ(u)| controls translation.

C2 → 0: translation invariance.

[Mallat, 2012, Bruna and Mallat, 2013, Sifre and Mallat, 2013]...

Julien Mairal Invariance and stability of DL 22/35

Page 37: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Warmup: translation invariance

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve translation invariance?

Translation: Lcx(u) = x(u− c).

Equivariance - all operators commute with Lc: Lc = Lc.

‖Φn(Lcx)− Φn(x)‖ = ‖LcΦn(x)− Φn(x)‖≤ ‖LcAn −An‖ · ‖MnPnΦn–1(x)‖≤ ‖LcAn −An‖‖x‖.

Mallat [2012]: ‖LτAn −An‖ ≤ C2σn‖τ‖∞ (operator norm).

Scale σn of the last layer controls translation invariance.

Julien Mairal Invariance and stability of DL 23/35

Page 38: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Warmup: translation invariance

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve translation invariance?

Translation: Lcx(u) = x(u− c).

Equivariance - all operators commute with Lc: Lc = Lc.

‖Φn(Lcx)− Φn(x)‖ = ‖LcΦn(x)− Φn(x)‖≤ ‖LcAn −An‖ · ‖MnPnΦn–1(x)‖≤ ‖LcAn −An‖‖x‖.

Mallat [2012]: ‖LτAn −An‖ ≤ C2σn‖τ‖∞ (operator norm).

Scale σn of the last layer controls translation invariance.

Julien Mairal Invariance and stability of DL 23/35

Page 39: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Warmup: translation invariance

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve translation invariance?

Translation: Lcx(u) = x(u− c).

Equivariance - all operators commute with Lc: Lc = Lc.

‖Φn(Lcx)− Φn(x)‖ = ‖LcΦn(x)− Φn(x)‖≤ ‖LcAn −An‖ · ‖MnPnΦn–1(x)‖≤ ‖LcAn −An‖‖x‖.

Mallat [2012]: ‖LτAn −An‖ ≤ C2σn‖τ‖∞ (operator norm).

Scale σn of the last layer controls translation invariance.

Julien Mairal Invariance and stability of DL 23/35

Page 40: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Warmup: translation invariance

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve translation invariance?

Translation: Lcx(u) = x(u− c).

Equivariance - all operators commute with Lc: Lc = Lc.

‖Φn(Lcx)− Φn(x)‖ = ‖LcΦn(x)− Φn(x)‖≤ ‖LcAn −An‖ · ‖MnPnΦn–1(x)‖≤ ‖LcAn −An‖‖x‖.

Mallat [2012]: ‖LcAn −An‖ ≤ C2σnc (operator norm).

Scale σn of the last layer controls translation invariance.

Julien Mairal Invariance and stability of DL 23/35

Page 41: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve stability to deformations?

Patch extraction Pk and pooling Ak do not commute with Lτ !

‖‖ ≤ C1‖∇τ‖∞ [from Mallat, 2012].

But: [Pk, Lτ ] is unstable at high frequencies!

Adapt to current layer resolution, patch size controlled by σk–1:

‖[PkAk–1, Lτ ]‖ ≤ C1,κ‖∇τ‖∞ supu∈Sk

|u| ≤ κσk–1

C1,κ grows as κd+1 =⇒ more stable with small patches(e.g., 3x3, VGG et al.).

Julien Mairal Invariance and stability of DL 24/35

Page 42: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve stability to deformations?

Patch extraction Pk and pooling Ak do not commute with Lτ !

‖AkLτ − LτAk‖ ≤ C1‖∇τ‖∞ [from Mallat, 2012].

But: [Pk, Lτ ] is unstable at high frequencies!

Adapt to current layer resolution, patch size controlled by σk–1:

‖[PkAk–1, Lτ ]‖ ≤ C1,κ‖∇τ‖∞ supu∈Sk

|u| ≤ κσk–1

C1,κ grows as κd+1 =⇒ more stable with small patches(e.g., 3x3, VGG et al.).

Julien Mairal Invariance and stability of DL 24/35

Page 43: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve stability to deformations?

Patch extraction Pk and pooling Ak do not commute with Lτ !

‖[Ak, Lτ ]‖ ≤ C1‖∇τ‖∞ [from Mallat, 2012].

But: [Pk, Lτ ] is unstable at high frequencies!

Adapt to current layer resolution, patch size controlled by σk–1:

‖[PkAk–1, Lτ ]‖ ≤ C1,κ‖∇τ‖∞ supu∈Sk

|u| ≤ κσk–1

C1,κ grows as κd+1 =⇒ more stable with small patches(e.g., 3x3, VGG et al.).

Julien Mairal Invariance and stability of DL 24/35

Page 44: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve stability to deformations?

Patch extraction Pk and pooling Ak do not commute with Lτ !

‖[Ak, Lτ ]‖ ≤ C1‖∇τ‖∞ [from Mallat, 2012].

But: [Pk, Lτ ] is unstable at high frequencies!

Adapt to current layer resolution, patch size controlled by σk–1:

‖[PkAk–1, Lτ ]‖ ≤ C1,κ‖∇τ‖∞ supu∈Sk

|u| ≤ κσk–1

C1,κ grows as κd+1 =⇒ more stable with small patches(e.g., 3x3, VGG et al.).

Julien Mairal Invariance and stability of DL 24/35

Page 45: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve stability to deformations?

Patch extraction Pk and pooling Ak do not commute with Lτ !

‖[Ak, Lτ ]‖ ≤ C1‖∇τ‖∞ [from Mallat, 2012].

But: [Pk, Lτ ] is unstable at high frequencies!

Adapt to current layer resolution, patch size controlled by σk–1:

‖[PkAk–1, Lτ ]‖ ≤ C1,κ‖∇τ‖∞ supu∈Sk

|u| ≤ κσk–1

C1,κ grows as κd+1 =⇒ more stable with small patches(e.g., 3x3, VGG et al.).

Julien Mairal Invariance and stability of DL 24/35

Page 46: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations

Representation

Φn(x)M= AnMnPnAn–1Mn–1Pn–1 · · · A1M1P1A0x.

How to achieve stability to deformations?

Patch extraction Pk and pooling Ak do not commute with Lτ !

‖[Ak, Lτ ]‖ ≤ C1‖∇τ‖∞ [from Mallat, 2012].

But: [Pk, Lτ ] is unstable at high frequencies!

Adapt to current layer resolution, patch size controlled by σk–1:

‖[PkAk–1, Lτ ]‖ ≤ C1,κ‖∇τ‖∞ supu∈Sk

|u| ≤ κσk–1

C1,κ grows as κd+1 =⇒ more stable with small patches(e.g., 3x3, VGG et al.).

Julien Mairal Invariance and stability of DL 24/35

Page 47: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations: final result

Theorem

If ‖∇τ‖∞ ≤ 1/2,

‖Φn(Lτx)− Φn(x)‖ ≤(C1,κ (n+ 1) ‖∇τ‖∞ +

C2

σn‖τ‖∞

)‖x‖.

translation invariance: large σn.

stability: small patch sizes.

signal preservation: subsampling factor ≈ patch size.

=⇒ needs several layers.

requires additional discussion to make stability non-trivial.

(also valid for generic CNNs with ReLUs: multiplyby∏k ρk =

∏k ‖Wk‖, but no signal preservation).

related work on stability [Wiatowski and Bolcskei, 2017]

Julien Mairal Invariance and stability of DL 25/35

Page 48: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations: final result

Theorem

If ‖∇τ‖∞ ≤ 1/2,

‖Φn(Lτx)− Φn(x)‖ ≤(C1,κ (n+ 1) ‖∇τ‖∞ +

C2

σn‖τ‖∞

)‖x‖.

translation invariance: large σn.

stability: small patch sizes.

signal preservation: subsampling factor ≈ patch size.

=⇒ needs several layers.

requires additional discussion to make stability non-trivial.

(also valid for generic CNNs with ReLUs: multiplyby∏k ρk =

∏k ‖Wk‖, but no signal preservation).

related work on stability [Wiatowski and Bolcskei, 2017]

Julien Mairal Invariance and stability of DL 25/35

Page 49: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Stability to deformations: final result

Theorem

If ‖∇τ‖∞ ≤ 1/2,

‖Φn(Lτx)− Φn(x)‖ ≤∏

k

ρk

(C1,κ (n+ 1) ‖∇τ‖∞ +

C2

σn‖τ‖∞

)‖x‖.

translation invariance: large σn.

stability: small patch sizes.

signal preservation: subsampling factor ≈ patch size.

=⇒ needs several layers.

requires additional discussion to make stability non-trivial.

(also valid for generic CNNs with ReLUs: multiplyby∏k ρk =

∏k ‖Wk‖, but no signal preservation).

related work on stability [Wiatowski and Bolcskei, 2017]

Julien Mairal Invariance and stability of DL 25/35

Page 50: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Beyond the translation group

Can we achieve invariance to other groups?

Group action: Lgx(u) = x(g−1u) (e.g., rotations, reflections).

Feature maps x(u) defined on u ∈ G (G: locally compact group).

Recipe: Equivariant inner layers + global pooling in last layer

Patch extraction:

Px(u) = (x(uv))v∈S .

Non-linear mapping: equivariant because pointwise!

Pooling (µ: left-invariant Haar measure):

Ax(u) =

Gx(uv)h(v)dµ(v) =

Gx(v)h(u−1v)dµ(v).

related work [Sifre and Mallat, 2013, Cohen and Welling, 2016, Raj et al., 2016]...

Julien Mairal Invariance and stability of DL 26/35

Page 51: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Beyond the translation group

Can we achieve invariance to other groups?

Group action: Lgx(u) = x(g−1u) (e.g., rotations, reflections).

Feature maps x(u) defined on u ∈ G (G: locally compact group).

Recipe: Equivariant inner layers + global pooling in last layer

Patch extraction:

Px(u) = (x(uv))v∈S .

Non-linear mapping: equivariant because pointwise!

Pooling (µ: left-invariant Haar measure):

Ax(u) =

Gx(uv)h(v)dµ(v) =

Gx(v)h(u−1v)dµ(v).

related work [Sifre and Mallat, 2013, Cohen and Welling, 2016, Raj et al., 2016]...

Julien Mairal Invariance and stability of DL 26/35

Page 52: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Group invariance and stability

Previous construction is similar to Cohen and Welling [2016] for CNNs.

A case of interest: the roto-translation group

G = R2 o SO(2) (mix of translations and rotations).

Stability with respect to the translation group.

Global invariance to rotations (only global pooling at final layer).

Inner layers: only pool on translation group.Last layer: global pooling on rotations.Cohen and Welling [2016]: pooling on rotations in inner layers hurtsperformance on Rotated MNIST

Julien Mairal Invariance and stability of DL 27/35

Page 53: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Discretization and signal preservation: example in 1D

Discrete signal xk in `2(Z, Hk) vs continuous ones xk in L2(R,Hk).

xk: subsampling factor sk after pooling with scale σk ≈ sk:

xk[n] = AkMkPkxk–1[nsk].

Claim: We can recover xk−1 from xk if factor sk ≤ patch size.

How? Recover patches with linear functions (contained in Hk)

〈fw, MkPkxk−1(u)〉 = fw(Pkxk−1(u)) = 〈w, Pkxk−1(u)〉,

andPkxk−1(u) =

w∈B〈fw, MkPkxk−1(u)〉w.

Warning: no claim that recovery is practical and/or stable.

Julien Mairal Invariance and stability of DL 28/35

Page 54: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Discretization and signal preservation: example in 1D

Discrete signal xk in `2(Z, Hk) vs continuous ones xk in L2(R,Hk).

xk: subsampling factor sk after pooling with scale σk ≈ sk:

xk[n] = AkMkPkxk–1[nsk].

Claim: We can recover xk−1 from xk if factor sk ≤ patch size.

How? Recover patches with linear functions (contained in Hk)

〈fw, MkPkxk−1(u)〉 = fw(Pkxk−1(u)) = 〈w, Pkxk−1(u)〉,

andPkxk−1(u) =

w∈B〈fw, MkPkxk−1(u)〉w.

Warning: no claim that recovery is practical and/or stable.

Julien Mairal Invariance and stability of DL 28/35

Page 55: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Discretization and signal preservation: example in 1D

Discrete signal xk in `2(Z, Hk) vs continuous ones xk in L2(R,Hk).

xk: subsampling factor sk after pooling with scale σk ≈ sk:

xk[n] = AkMkPkxk–1[nsk].

Claim: We can recover xk−1 from xk if factor sk ≤ patch size.

How? Recover patches with linear functions (contained in Hk)

〈fw, MkPkxk−1(u)〉 = fw(Pkxk−1(u)) = 〈w, Pkxk−1(u)〉,

andPkxk−1(u) =

w∈B〈fw, MkPkxk−1(u)〉w.

Warning: no claim that recovery is practical and/or stable.

Julien Mairal Invariance and stability of DL 28/35

Page 56: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Discretization and signal preservation: example in 1D

Discrete signal xk in `2(Z, Hk) vs continuous ones xk in L2(R,Hk).

xk: subsampling factor sk after pooling with scale σk ≈ sk:

xk[n] = AkMkPkxk–1[nsk].

Claim: We can recover xk−1 from xk if factor sk ≤ patch size.

How? Recover patches with linear functions (contained in Hk)

〈fw, MkPkxk−1(u)〉 = fw(Pkxk−1(u)) = 〈w, Pkxk−1(u)〉,

andPkxk−1(u) =

w∈B〈fw, MkPkxk−1(u)〉w.

Warning: no claim that recovery is practical and/or stable.

Julien Mairal Invariance and stability of DL 28/35

Page 57: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Discretization and signal preservation: example in 1D

xk−1

Pkxk−1(u) ∈ Pk

MkPkxk−1

dot-product kernel

AkMkPkxk−1

linear pooling

downsampling

xk

recovery with linear measurements

Akxk−1

deconvolution

xk−1

Julien Mairal Invariance and stability of DL 29/35

Page 58: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Outline

1 Construction of the multi-layer convolutional representation

2 Invariance and stability

3 Learning aspects: model complexity

Julien Mairal Invariance and stability of DL 30/35

Page 59: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

RKHS of patch kernels Kk

Kk(z, z′) = ‖z‖‖z′‖κ

( 〈z, z′〉‖z‖‖z′‖

), κ(u) =

∞∑

j=0

bjuj .

What does the RKHS contain?

RKHS contains homogeneous functions:

f : z 7→ ‖z‖σ(〈g, z〉/‖z‖).

Smooth activations: σ(u) =∑∞

j=0 ajuj with aj ≥ 0.

Norm: ‖f‖2Hk≤ C2

σ(‖g‖2) =∑∞

j=0

a2jbj‖g‖2 <∞.

Homogeneous version of [Zhang et al., 2016, 2017]

Julien Mairal Invariance and stability of DL 31/35

Page 60: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

RKHS of patch kernels Kk

Kk(z, z′) = ‖z‖‖z′‖κ

( 〈z, z′〉‖z‖‖z′‖

), κ(u) =

∞∑

j=0

bjuj .

What does the RKHS contain?

RKHS contains homogeneous functions:

f : z 7→ ‖z‖σ(〈g, z〉/‖z‖).

Smooth activations: σ(u) =∑∞

j=0 ajuj with aj ≥ 0.

Norm: ‖f‖2Hk≤ C2

σ(‖g‖2) =∑∞

j=0

a2jbj‖g‖2 <∞.

Homogeneous version of [Zhang et al., 2016, 2017]

Julien Mairal Invariance and stability of DL 31/35

Page 61: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

RKHS of patch kernels Kk

Kk(z, z′) = ‖z‖‖z′‖κ

( 〈z, z′〉‖z‖‖z′‖

), κ(u) =

∞∑

j=0

bjuj .

What does the RKHS contain?

RKHS contains homogeneous functions:

f : z 7→ ‖z‖σ(〈g, z〉/‖z‖).

Smooth activations: σ(u) =∑∞

j=0 ajuj with aj ≥ 0.

Norm: ‖f‖2Hk≤ C2

σ(‖g‖2) =∑∞

j=0

a2jbj‖g‖2 <∞.

Homogeneous version of [Zhang et al., 2016, 2017]

Julien Mairal Invariance and stability of DL 31/35

Page 62: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

RKHS of patch kernels Kk

Examples:

σ(u) = u (linear): C2σ(λ2) = O(λ2).

σ(u) = up (polynomial): C2σ(λ2) = O(λ2p).

σ ≈ sin, sigmoid, smooth ReLU: C2σ(λ2) = O(ecλ

2).

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0x

0.0

0.5

1.0

1.5

2.0

f(x)

f : x (x)ReLUsReLU

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0x

0

1

2

3

4

f(x)

f : x |x| (wx/|x|)ReLU, w=1sReLU, w = 0sReLU, w = 0.5sReLU, w = 1sReLU, w = 2

Julien Mairal Invariance and stability of DL 32/35

Page 63: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Constructing a CNN in the RKHS HKSome CNNs live in the RKHS: “linearization” principle

f(x) = σk(Wkσk–1(Wk–1 . . . σ2(W2σ1(W1x)) . . .)) = 〈f,Φ(x)〉H.

Consider a CNN with filters W ijk (u), u ∈ Sk.

k: layer;i: index of filter;j: index of input channel.

“Smooth homogeneous” activations σ.

The CNN can be constructed hierarchically in HK.

Norm:

‖fσ‖2 ≤ ‖Wn+1‖22 C2σ(‖Wn‖22 C2

σ(‖Wn–1‖22 C2σ(. . . ))).

Linear layers: product of spectral norms.

Julien Mairal Invariance and stability of DL 33/35

Page 64: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Constructing a CNN in the RKHS HKSome CNNs live in the RKHS: “linearization” principle

f(x) = σk(Wkσk–1(Wk–1 . . . σ2(W2σ1(W1x)) . . .)) = 〈f,Φ(x)〉H.

Consider a CNN with filters W ijk (u), u ∈ Sk.

k: layer;i: index of filter;j: index of input channel.

“Smooth homogeneous” activations σ.

The CNN can be constructed hierarchically in HK.

Norm (linear layers):

‖fσ‖2 ≤ ‖Wn+1‖22 · ‖Wn‖22 · ‖Wn–1‖22 . . . ‖W1‖22.

Linear layers: product of spectral norms.

Julien Mairal Invariance and stability of DL 33/35

Page 65: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Link with generalization

Direct application of classical generalization bounds

Simple bound on Rademacher complexity for linear/kernel methods:

FB = f ∈ HK, ‖f‖ ≤ B =⇒ RadN (FB) ≤ O(BR√N

).

Leads to margin bound O(‖fN‖R/γ√N) for a learned CNN fN

with margin (confidence) γ > 0.

Related to recent generalization bounds for neural networks basedon product of spectral norms [e.g., Bartlett et al., 2017,Neyshabur et al., 2018].

[see, e.g., Boucheron et al., 2005, Shalev-Shwartz and Ben-David, 2014]...

Julien Mairal Invariance and stability of DL 34/35

Page 66: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Link with generalization

Direct application of classical generalization bounds

Simple bound on Rademacher complexity for linear/kernel methods:

FB = f ∈ HK, ‖f‖ ≤ B =⇒ RadN (FB) ≤ O(BR√N

).

Leads to margin bound O(‖fN‖R/γ√N) for a learned CNN fN

with margin (confidence) γ > 0.

Related to recent generalization bounds for neural networks basedon product of spectral norms [e.g., Bartlett et al., 2017,Neyshabur et al., 2018].

[see, e.g., Boucheron et al., 2005, Shalev-Shwartz and Ben-David, 2014]...

Julien Mairal Invariance and stability of DL 34/35

Page 67: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Deep convolutional representations: conclusions

Study of generic properties of signal representation

Deformation stability with small patches, adapted to resolution.

Signal preservation when subsampling ≤ patch size.

Group invariance by changing patch extraction and pooling.

Applies to learned models

Same quantity ‖f‖ controls stability and generalization.

“higher capacity” is needed to discriminate small deformations.

Questions:

Better regularization?

How does SGD control capacity in CNNs?

What about networks with no pooling layers? ResNet?

Julien Mairal Invariance and stability of DL 35/35

Page 68: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Deep convolutional representations: conclusions

Study of generic properties of signal representation

Deformation stability with small patches, adapted to resolution.

Signal preservation when subsampling ≤ patch size.

Group invariance by changing patch extraction and pooling.

Applies to learned models

Same quantity ‖f‖ controls stability and generalization.

“higher capacity” is needed to discriminate small deformations.

Questions:

Better regularization?

How does SGD control capacity in CNNs?

What about networks with no pooling layers? ResNet?

Julien Mairal Invariance and stability of DL 35/35

Page 69: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Deep convolutional representations: conclusions

Study of generic properties of signal representation

Deformation stability with small patches, adapted to resolution.

Signal preservation when subsampling ≤ patch size.

Group invariance by changing patch extraction and pooling.

Applies to learned models

Same quantity ‖f‖ controls stability and generalization.

“higher capacity” is needed to discriminate small deformations.

Questions:

Better regularization?

How does SGD control capacity in CNNs?

What about networks with no pooling layers? ResNet?

Julien Mairal Invariance and stability of DL 35/35

Page 70: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

References I

Stephanie Allassonniere, Yali Amit, and Alain Trouve. Towards a coherentstatistical framework for dense deformable template estimation. Journal ofthe Royal Statistical Society: Series B (Statistical Methodology), 69(1):3–29, 2007.

Francis Bach. On the equivalence between kernel quadrature rules and randomfeature expansions. Journal of Machine Learning Research (JMLR), 18:1–38,2017.

Peter Bartlett, Dylan J Foster, and Matus Telgarsky. Spectrally-normalizedmargin bounds for neural networks. arXiv preprint arXiv:1706.08498, 2017.

Stephane Boucheron, Olivier Bousquet, and Gabor Lugosi. Theory ofclassification: A survey of some recent advances. ESAIM: probability andstatistics, 9:323–375, 2005.

Joan Bruna and Stephane Mallat. Invariant scattering convolution networks.IEEE Transactions on pattern analysis and machine intelligence (PAMI), 35(8):1872–1886, 2013.

Julien Mairal Invariance and stability of DL 36/35

Page 71: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

References IIY. Cho and L. K. Saul. Large-margin classification in infinite neural networks.

Neural Computation, 22(10):2678–2697, 2010.

Taco Cohen and Max Welling. Group equivariant convolutional networks. InInternational Conference on Machine Learning (ICML), 2016.

Amit Daniely, Roy Frostig, and Yoram Singer. Toward deeper understanding ofneural networks: The power of initialization and a dual view on expressivity.In Advances In Neural Information Processing Systems, pages 2253–2261,2016.

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in thephysical world. arXiv preprint arXiv:1607.02533, 2016.

J. Mairal. End-to-end kernel learning with supervised convolutional kernelnetworks. In Advances in Neural Information Processing Systems (NIPS),2016.

Stephane Mallat. Group invariant scattering. Communications on Pure andApplied Mathematics, 65(10):1331–1398, 2012.

Julien Mairal Invariance and stability of DL 37/35

Page 72: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

References IIIBehnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan

Srebro. A PAC-Bayesian approach to spectrally-normalized margin boundsfor neural networks. In Proceedings of the International Conference onLearning Representations (ICLR), 2018.

Anant Raj, Abhishek Kumar, Youssef Mroueh, P Thomas Fletcher, andBernhard Scholkopf. Local group invariant representations via orbitembeddings. preprint arXiv:1612.01988, 2016.

I. Schoenberg. Positive definite functions on spheres. Duke Math. J., 1942.

B. Scholkopf. Support Vector Learning. PhD thesis, Technischen UniversitatBerlin, 1997.

Bernhard Scholkopf and Alexander J Smola. Learning with kernels: supportvector machines, regularization, optimization, and beyond. MIT press, 2002.

Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning:From theory to algorithms. Cambridge university press, 2014.

John Shawe-Taylor and Nello Cristianini. An introduction to support vectormachines and other kernel-based learning methods. Cambridge UniversityPress, 2004.

Julien Mairal Invariance and stability of DL 38/35

Page 73: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

References IVLaurent Sifre and Stephane Mallat. Rotation, scaling and deformation invariant

scattering for texture discrimination. In Proceedings of the IEEE conferenceon computer vision and pattern recognition (CVPR), 2013.

Alex J Smola and Bernhard Scholkopf. Sparse greedy matrix approximation formachine learning. In Proceedings of the International Conference onMachine Learning (ICML), 2000.

Alex J Smola, Zoltan L Ovari, and Robert C Williamson. Regularization withdot-product kernels. In Advances in neural information processing systems,pages 308–314, 2001.

Alain Trouve and Laurent Younes. Local geometry of deformable templates.SIAM journal on mathematical analysis, 37(1):17–59, 2005.

Thomas Wiatowski and Helmut Bolcskei. A mathematical theory of deepconvolutional neural networks for feature extraction. IEEE Transactions onInformation Theory, 2017.

C. Williams and M. Seeger. Using the Nystrom method to speed up kernelmachines. In Advances in Neural Information Processing Systems (NIPS),2001.

Julien Mairal Invariance and stability of DL 39/35

Page 74: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

References VKai Zhang, Ivor W Tsang, and James T Kwok. Improved nystrom low-rank

approximation and error analysis. In International Conference on MachineLearning (ICML), 2008.

Y. Zhang, P. Liang, and M. J. Wainwright. Convexified convolutional neuralnetworks. In International Conference on Machine Learning (ICML), 2017.

Yuchen Zhang, Jason D Lee, and Michael I Jordan. `1-regularized neuralnetworks are improperly learnable in polynomial time. In InternationalConference on Machine Learning (ICML), 2016.

Julien Mairal Invariance and stability of DL 40/35

Page 75: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

ϕk from kernel approximations: CKNs [Mairal, 2016]

Approximate ϕk(z) by projection (Nystrom approximation) on

F = Span(ϕk(z1), . . . , ϕk(zp)).

Hilbert space H

F

ϕ(x)

ϕ(x′)

Figure: Nystrom approximation.

[Williams and Seeger, 2001, Smola and Scholkopf, 2000, Zhang et al., 2008]...

Julien Mairal Invariance and stability of DL 41/35

Page 76: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

ϕk from kernel approximations: CKNs [Mairal, 2016]

Approximate ϕk(z) by projection (Nystrom approximation) on

F = Span(ϕk(z1), . . . , ϕk(zp)).

Leads to tractable, p-dimensional representation ψk(z).

Norm is preserved, and projection is non-expansive:

‖ψk(z)− ψk(z′)‖ = ‖Πkϕk(z)−Πkϕk(z′)‖

≤ ‖ϕk(z)− ϕk(z′)‖ ≤ ‖z − z′‖.

Anchor points z1, . . . , zp (≈ filters) can be learned from data(K-means or backprop).

[Williams and Seeger, 2001, Smola and Scholkopf, 2000, Zhang et al., 2008]...

Julien Mairal Invariance and stability of DL 41/35

Page 77: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

ϕk from kernel approximations: CKNs [Mairal, 2016]

Convolutional kernel networks in practice.

I0

x

x′

kernel trick

projection on F1

M1

ψ1(x)

ψ1(x′)

I1linear pooling

Hilbert space H1

F1

ϕ1(x)

ϕ1(x′)

Julien Mairal Invariance and stability of DL 42/35

Page 78: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Discussion

norm of ‖Φ(x)‖ is of the same order (or close enough) to ‖x‖.the kernel representation is non-expansive but not contractive

supx,x′∈L2(Ω,H0)

‖Φ(x)− Φ(x′)‖‖x− x′‖ = 1.

Julien Mairal Invariance and stability of DL 43/35

Page 79: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Future of Convolutional Neural Networks

What are current high-potential problems to solve?

1 lack of robustness (see next slide).

2 learning with few labeled data.

3 learning with no supervision (see Tab. from Bojanowski and Joulin, 2017).

Julien Mairal Invariance and stability of DL 44/35

Page 80: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Future of Convolutional Neural Networks

Illustration of instability. Picture from Kurakin et al. [2016].

Figure: Adversarial examples are generated by computer; then printed on paper;a new picture taken on a smartphone fools the classifier.

Julien Mairal Invariance and stability of DL 45/35

Page 81: Invariance and Stability to Deformations of Deep ...thoth.inrialpes.fr/people/mairal/resources/pdf/telecom_workshop.pdf · Invariance and Stability to Deformations of Deep Convolutional

Future of Convolutional Neural Networks

minf∈F

1

n

n∑

i=1

L(yi, f(xi))

︸ ︷︷ ︸empirical risk, data fit

+ λΩ(f)︸ ︷︷ ︸regularization

.

The issue of regularization

today, heuristics are used (DropOut, weight decay, early stopping)...

...but they are not sufficient.

how to control variations of prediction functions?

|f(x)− f(x′)| should be close if x and x′ are “similar”.

what does it mean for x and x′ to be “similar”?

what should be a good regularization function Ω?

Julien Mairal Invariance and stability of DL 46/35