lecture 10 cnns on graphs - university of chicagoshubhendu/pages/files/lecture10_pauses.… ·...

75
Lecture 10 CNNs on Graphs CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 26, 2017 Lecture 10 CNNs on Graphs CMSC 35246

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Lecture 10CNNs on Graphs

CMSC 35246: Deep Learning

Shubhendu Trivedi&

Risi Kondor

University of Chicago

April 26, 2017

Lecture 10 CNNs on Graphs CMSC 35246

Page 2: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Two Scenarios

For CNNs on graphs, we have two distinct scenarios:

• Scenario 1: Each data point lives in Rd, but the datasethas an underlying graph structure

I Each coordinate is a value associated with a vertex ofunderlying graph

I For images: The underlying graph is always a grid of fixeddimensions

• Scenario 2: Each data point is itself a graph (Exampleregression task: Molecules as input, boiling points asoutput)

I Each graph can be of different sizeI Sub-problem: Given a graph G, find an embedding φ : G → Rp

Lecture 10 CNNs on Graphs CMSC 35246

Page 3: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Two Scenarios

For CNNs on graphs, we have two distinct scenarios:

• Scenario 1: Each data point lives in Rd, but the datasethas an underlying graph structure

I Each coordinate is a value associated with a vertex ofunderlying graph

I For images: The underlying graph is always a grid of fixeddimensions

• Scenario 2: Each data point is itself a graph (Exampleregression task: Molecules as input, boiling points asoutput)

I Each graph can be of different sizeI Sub-problem: Given a graph G, find an embedding φ : G → Rp

Lecture 10 CNNs on Graphs CMSC 35246

Page 4: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Two Scenarios

For CNNs on graphs, we have two distinct scenarios:

• Scenario 1: Each data point lives in Rd, but the datasethas an underlying graph structure

I Each coordinate is a value associated with a vertex ofunderlying graph

I For images: The underlying graph is always a grid of fixeddimensions

• Scenario 2: Each data point is itself a graph (Exampleregression task: Molecules as input, boiling points asoutput)

I Each graph can be of different sizeI Sub-problem: Given a graph G, find an embedding φ : G → Rp

Lecture 10 CNNs on Graphs CMSC 35246

Page 5: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Scenario 1

CNNs on data in irregular domains

Lecture 10 CNNs on Graphs CMSC 35246

Page 6: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

So far we have defined CNNs on grids

We model images and feature maps as functions on arectangular domain

f : Z2 → RK

In general the grid can be Zd

CNNs are able to exploit various structures that reducesample complexity

• Translation structure (allowing use of filters)• Metric on the grid (allows compactly supported filters)• Multiscale structure of the grid (allows subsampling)

Lecture 10 CNNs on Graphs CMSC 35246

Page 7: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

So far we have defined CNNs on grids

We model images and feature maps as functions on arectangular domain

f : Z2 → RK

In general the grid can be Zd

CNNs are able to exploit various structures that reducesample complexity

• Translation structure (allowing use of filters)• Metric on the grid (allows compactly supported filters)• Multiscale structure of the grid (allows subsampling)

Lecture 10 CNNs on Graphs CMSC 35246

Page 8: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

So far we have defined CNNs on grids

We model images and feature maps as functions on arectangular domain

f : Z2 → RK

In general the grid can be Zd

CNNs are able to exploit various structures that reducesample complexity

• Translation structure (allowing use of filters)• Metric on the grid (allows compactly supported filters)• Multiscale structure of the grid (allows subsampling)

Lecture 10 CNNs on Graphs CMSC 35246

Page 9: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

So far we have defined CNNs on grids

We model images and feature maps as functions on arectangular domain

f : Z2 → RK

In general the grid can be Zd

CNNs are able to exploit various structures that reducesample complexity

• Translation structure (allowing use of filters)

• Metric on the grid (allows compactly supported filters)• Multiscale structure of the grid (allows subsampling)

Lecture 10 CNNs on Graphs CMSC 35246

Page 10: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

So far we have defined CNNs on grids

We model images and feature maps as functions on arectangular domain

f : Z2 → RK

In general the grid can be Zd

CNNs are able to exploit various structures that reducesample complexity

• Translation structure (allowing use of filters)• Metric on the grid (allows compactly supported filters)

• Multiscale structure of the grid (allows subsampling)

Lecture 10 CNNs on Graphs CMSC 35246

Page 11: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

So far we have defined CNNs on grids

We model images and feature maps as functions on arectangular domain

f : Z2 → RK

In general the grid can be Zd

CNNs are able to exploit various structures that reducesample complexity

• Translation structure (allowing use of filters)• Metric on the grid (allows compactly supported filters)• Multiscale structure of the grid (allows subsampling)

Lecture 10 CNNs on Graphs CMSC 35246

Page 12: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

Z2(i− i0)

Z2(i)

The translation group acts on Z2

We are able to exploit this symmetry of the grid in CNNs

Lecture 10 CNNs on Graphs CMSC 35246

Page 13: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

Z2(i− i0)

Z2(i)

The translation group acts on Z2

We are able to exploit this symmetry of the grid in CNNs

Lecture 10 CNNs on Graphs CMSC 35246

Page 14: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

Z2(i− i0)

Z2(i)

The translation group acts on Z2

We are able to exploit this symmetry of the grid in CNNs

Lecture 10 CNNs on Graphs CMSC 35246

Page 15: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

If we have n input pixels, a fully connected network with moutputs has nm parameters, roughly O(n2)

With k filters, each with support S we have O(kS)(independent of n)

Using multiscale nature, we can pool, and reduce the numberof parameters further

Lecture 10 CNNs on Graphs CMSC 35246

Page 16: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

If we have n input pixels, a fully connected network with moutputs has nm parameters, roughly O(n2)

With k filters, each with support S we have O(kS)(independent of n)

Using multiscale nature, we can pool, and reduce the numberof parameters further

Lecture 10 CNNs on Graphs CMSC 35246

Page 17: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

CNNs on Grids

If we have n input pixels, a fully connected network with moutputs has nm parameters, roughly O(n2)

With k filters, each with support S we have O(kS)(independent of n)

Using multiscale nature, we can pool, and reduce the numberof parameters further

Lecture 10 CNNs on Graphs CMSC 35246

Page 18: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Data on Irregular Domains

Often we can have structured data defined over coordinatesthat does not enjoy any of these properties

Example: 3-D mesh data (each coordinate might be surfacetension)

More: Social network data, protein interaction networks etc.

In each case we again have n coordinates but which don’t liveon a regular grid

Figure source: Eurocom Face Modeling

Lecture 10 CNNs on Graphs CMSC 35246

Page 19: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Data on Irregular Domains

Often we can have structured data defined over coordinatesthat does not enjoy any of these properties

Example: 3-D mesh data (each coordinate might be surfacetension)

More: Social network data, protein interaction networks etc.

In each case we again have n coordinates but which don’t liveon a regular grid

Figure source: Eurocom Face Modeling

Lecture 10 CNNs on Graphs CMSC 35246

Page 20: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Functions on Graphs

We can think of a n dimensional image as a function definedon the vertices of a graph G = (Ω, E) with |Ω| = n

G just happens to be a grid graph with strong local structurewhich makes CNNs useful

In general we can have signals defined over a general graph:

Lecture 10 CNNs on Graphs CMSC 35246

Page 21: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Functions on Graphs

We can think of a n dimensional image as a function definedon the vertices of a graph G = (Ω, E) with |Ω| = n

G just happens to be a grid graph with strong local structurewhich makes CNNs useful

In general we can have signals defined over a general graph:

Lecture 10 CNNs on Graphs CMSC 35246

Page 22: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Functions on Graphs

We can think of a n dimensional image as a function definedon the vertices of a graph G = (Ω, E) with |Ω| = n

G just happens to be a grid graph with strong local structurewhich makes CNNs useful

In general we can have signals defined over a general graph:

Lecture 10 CNNs on Graphs CMSC 35246

Page 23: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Functions on Graphs

1

2 3

4 5

6

7

8

9

Ω is the vertex set (input coordinates), Wi,j the similaritybetween any two coordinates i and j

Note: Wi,j is similarity between coordinates, not datapoints

Lecture 10 CNNs on Graphs CMSC 35246

Page 24: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Functions on Graphs

1

2 3

4 5

6

7

8

9

Ω is the vertex set (input coordinates), Wi,j the similaritybetween any two coordinates i and j

Note: Wi,j is similarity between coordinates, not datapoints

Lecture 10 CNNs on Graphs CMSC 35246

Page 25: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Functions on Graphs

1

2 3

4 5

6

7

8

9

If the underlying graph structure is known, Wi,j will beavailable

If unknown: Need to estimate it from training data

Lecture 10 CNNs on Graphs CMSC 35246

Page 26: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Functions on Graphs

1

2 3

4 5

6

7

8

9

If the underlying graph structure is known, Wi,j will beavailable

If unknown: Need to estimate it from training data

Lecture 10 CNNs on Graphs CMSC 35246

Page 27: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

Locally Connected Networks

Lecture 10 CNNs on Graphs CMSC 35246

Page 28: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

So we replace a grid by a general graph G = (Ω, E)

The notion of locality can be generalized easily via W

For given W and threshold δ, we have neighborhoods:

Nδ(j) = i ∈ Ω : Wi,j > δ

Can have filters with receptive fields given by theseneighborhoods

Number of parameters: O(Sn) (S is average neighbhorhoodsize)

Lecture 10 CNNs on Graphs CMSC 35246

Page 29: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

So we replace a grid by a general graph G = (Ω, E)

The notion of locality can be generalized easily via W

For given W and threshold δ, we have neighborhoods:

Nδ(j) = i ∈ Ω : Wi,j > δ

Can have filters with receptive fields given by theseneighborhoods

Number of parameters: O(Sn) (S is average neighbhorhoodsize)

Lecture 10 CNNs on Graphs CMSC 35246

Page 30: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

So we replace a grid by a general graph G = (Ω, E)

The notion of locality can be generalized easily via W

For given W and threshold δ, we have neighborhoods:

Nδ(j) = i ∈ Ω : Wi,j > δ

Can have filters with receptive fields given by theseneighborhoods

Number of parameters: O(Sn) (S is average neighbhorhoodsize)

Lecture 10 CNNs on Graphs CMSC 35246

Page 31: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

So we replace a grid by a general graph G = (Ω, E)

The notion of locality can be generalized easily via W

For given W and threshold δ, we have neighborhoods:

Nδ(j) = i ∈ Ω : Wi,j > δ

Can have filters with receptive fields given by theseneighborhoods

Number of parameters: O(Sn) (S is average neighbhorhoodsize)

Lecture 10 CNNs on Graphs CMSC 35246

Page 32: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

So we replace a grid by a general graph G = (Ω, E)

The notion of locality can be generalized easily via W

For given W and threshold δ, we have neighborhoods:

Nδ(j) = i ∈ Ω : Wi,j > δ

Can have filters with receptive fields given by theseneighborhoods

Number of parameters: O(Sn) (S is average neighbhorhoodsize)

Lecture 10 CNNs on Graphs CMSC 35246

Page 33: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

So we replace a grid by a general graph G = (Ω, E)

The notion of locality can be generalized easily via W

For given W and threshold δ, we have neighborhoods:

Nδ(j) = i ∈ Ω : Wi,j > δ

Can have filters with receptive fields given by theseneighborhoods

Number of parameters: O(Sn) (S is average neighbhorhoodsize)

Lecture 10 CNNs on Graphs CMSC 35246

Page 34: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

To mimic subsampling and pooling, we can do a multiscaleclustering of the graph (K scales)

Set Ω0 = Ω, at each level k = 1, . . . ,K define Ωk and Ωk−1

Ωk is a partition of Ωk−1 in dk clusters

Around every element of Ωk−1, we can define theneighborhood

Nk = Nk,i : i = 1 . . . dk−1

Lecture 10 CNNs on Graphs CMSC 35246

Page 35: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

To mimic subsampling and pooling, we can do a multiscaleclustering of the graph (K scales)

Set Ω0 = Ω, at each level k = 1, . . . ,K define Ωk and Ωk−1

Ωk is a partition of Ωk−1 in dk clusters

Around every element of Ωk−1, we can define theneighborhood

Nk = Nk,i : i = 1 . . . dk−1

Lecture 10 CNNs on Graphs CMSC 35246

Page 36: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

To mimic subsampling and pooling, we can do a multiscaleclustering of the graph (K scales)

Set Ω0 = Ω, at each level k = 1, . . . ,K define Ωk and Ωk−1

Ωk is a partition of Ωk−1 in dk clusters

Around every element of Ωk−1, we can define theneighborhood

Nk = Nk,i : i = 1 . . . dk−1

Lecture 10 CNNs on Graphs CMSC 35246

Page 37: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

To mimic subsampling and pooling, we can do a multiscaleclustering of the graph (K scales)

Set Ω0 = Ω, at each level k = 1, . . . ,K define Ωk and Ωk−1

Ωk is a partition of Ωk−1 in dk clusters

Around every element of Ωk−1, we can define theneighborhood

Nk = Nk,i : i = 1 . . . dk−1

Lecture 10 CNNs on Graphs CMSC 35246

Page 38: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

To mimic subsampling and pooling, we can do a multiscaleclustering of the graph (K scales)

Set Ω0 = Ω, at each level k = 1, . . . ,K define Ωk and Ωk−1

Ωk is a partition of Ωk−1 in dk clusters

Around every element of Ωk−1, we can define theneighborhood

Nk = Nk,i : i = 1 . . . dk−1

Lecture 10 CNNs on Graphs CMSC 35246

Page 39: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spatial Construction

To mimic subsampling and pooling, we can do a multiscaleclustering of the graph (K scales)

Set Ω0 = Ω, at each level k = 1, . . . ,K define Ωk and Ωk−1

Ωk is a partition of Ωk−1 in dk clusters

Around every element of Ωk−1, we can define theneighborhood

Nk = Nk,i : i = 1 . . . dk−1

Lecture 10 CNNs on Graphs CMSC 35246

Page 40: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Defining the Network

Let number of filters at layer be given by fk

Every layer will transform a fk−1 dimensional signal, indexedby Ωk−1 into a fk indexed by Ωk

If xk = (xk,i; i = 1 . . . fk−1) is the dk−1 × fk−1 dim input tolayer k, the output is defined as:

xk+1,j = Lkh

( fk−1∑i=1

Fk,i,jxk,i

)with j = 1 . . . fk

Fk,i,j is a dk−1 × dk−1 sparse matrix with Nk indicated byzeros

h is the non-linearity and Lk is the pooling operation

Lecture 10 CNNs on Graphs CMSC 35246

Page 41: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Defining the Network

Let number of filters at layer be given by fk

Every layer will transform a fk−1 dimensional signal, indexedby Ωk−1 into a fk indexed by Ωk

If xk = (xk,i; i = 1 . . . fk−1) is the dk−1 × fk−1 dim input tolayer k, the output is defined as:

xk+1,j = Lkh

( fk−1∑i=1

Fk,i,jxk,i

)with j = 1 . . . fk

Fk,i,j is a dk−1 × dk−1 sparse matrix with Nk indicated byzeros

h is the non-linearity and Lk is the pooling operation

Lecture 10 CNNs on Graphs CMSC 35246

Page 42: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Defining the Network

Let number of filters at layer be given by fk

Every layer will transform a fk−1 dimensional signal, indexedby Ωk−1 into a fk indexed by Ωk

If xk = (xk,i; i = 1 . . . fk−1) is the dk−1 × fk−1 dim input tolayer k, the output is defined as:

xk+1,j = Lkh

( fk−1∑i=1

Fk,i,jxk,i

)with j = 1 . . . fk

Fk,i,j is a dk−1 × dk−1 sparse matrix with Nk indicated byzeros

h is the non-linearity and Lk is the pooling operation

Lecture 10 CNNs on Graphs CMSC 35246

Page 43: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Defining the Network

Let number of filters at layer be given by fk

Every layer will transform a fk−1 dimensional signal, indexedby Ωk−1 into a fk indexed by Ωk

If xk = (xk,i; i = 1 . . . fk−1) is the dk−1 × fk−1 dim input tolayer k, the output is defined as:

xk+1,j = Lkh

( fk−1∑i=1

Fk,i,jxk,i

)with j = 1 . . . fk

Fk,i,j is a dk−1 × dk−1 sparse matrix with Nk indicated byzeros

h is the non-linearity and Lk is the pooling operation

Lecture 10 CNNs on Graphs CMSC 35246

Page 44: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Defining the Network

Let number of filters at layer be given by fk

Every layer will transform a fk−1 dimensional signal, indexedby Ωk−1 into a fk indexed by Ωk

If xk = (xk,i; i = 1 . . . fk−1) is the dk−1 × fk−1 dim input tolayer k, the output is defined as:

xk+1,j = Lkh

( fk−1∑i=1

Fk,i,jxk,i

)with j = 1 . . . fk

Fk,i,j is a dk−1 × dk−1 sparse matrix with Nk indicated byzeros

h is the non-linearity and Lk is the pooling operation

Lecture 10 CNNs on Graphs CMSC 35246

Page 45: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Defining the Network

Let number of filters at layer be given by fk

Every layer will transform a fk−1 dimensional signal, indexedby Ωk−1 into a fk indexed by Ωk

If xk = (xk,i; i = 1 . . . fk−1) is the dk−1 × fk−1 dim input tolayer k, the output is defined as:

xk+1,j = Lkh

( fk−1∑i=1

Fk,i,jxk,i

)with j = 1 . . . fk

Fk,i,j is a dk−1 × dk−1 sparse matrix with Nk indicated byzeros

h is the non-linearity and Lk is the pooling operation

Lecture 10 CNNs on Graphs CMSC 35246

Page 46: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Locally Connected Networks: In Pictures

Level 1 clustering

This and next few illustrations are by Joan Bruna

Lecture 10 CNNs on Graphs CMSC 35246

Page 47: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Locally Connected Networks: In Pictures

Pooling to get Ω1

Lecture 10 CNNs on Graphs CMSC 35246

Page 48: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Locally Connected Networks: In Pictures

Pooling to get Ω1

Lecture 10 CNNs on Graphs CMSC 35246

Page 49: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Locally Connected Networks: In Pictures

Level 2 clustering

Lecture 10 CNNs on Graphs CMSC 35246

Page 50: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Locally Connected Networks: In Pictures

Multiple Feature maps: Level 1

Lecture 10 CNNs on Graphs CMSC 35246

Page 51: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Locally Connected Networks: In Pictures

Multiple Feature maps: Level 2

Lecture 10 CNNs on Graphs CMSC 35246

Page 52: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spectral Construction

Spectral Networks

Lecture 10 CNNs on Graphs CMSC 35246

Page 53: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Quick Digression: The Graph Laplacian

Lecture 10 CNNs on Graphs CMSC 35246

Page 54: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spectral Networks

Again consider W ∈ Rd×d, the weighted adjacency matrix forG = (Ω, E)

We consider the following definition of the Graph Laplacian:

L = I −D−1/2WD−1/2

D is a diagonal matrix; the degree matrix with Di,i =∑

iWi,:

Let U = [u1, . . . , ud] be the eigenvectors of L

Lecture 10 CNNs on Graphs CMSC 35246

Page 55: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spectral Networks

Again consider W ∈ Rd×d, the weighted adjacency matrix forG = (Ω, E)

We consider the following definition of the Graph Laplacian:

L = I −D−1/2WD−1/2

D is a diagonal matrix; the degree matrix with Di,i =∑

iWi,:

Let U = [u1, . . . , ud] be the eigenvectors of L

Lecture 10 CNNs on Graphs CMSC 35246

Page 56: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spectral Networks

Again consider W ∈ Rd×d, the weighted adjacency matrix forG = (Ω, E)

We consider the following definition of the Graph Laplacian:

L = I −D−1/2WD−1/2

D is a diagonal matrix; the degree matrix with Di,i =∑

iWi,:

Let U = [u1, . . . , ud] be the eigenvectors of L

Lecture 10 CNNs on Graphs CMSC 35246

Page 57: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Spectral Networks

Again consider W ∈ Rd×d, the weighted adjacency matrix forG = (Ω, E)

We consider the following definition of the Graph Laplacian:

L = I −D−1/2WD−1/2

D is a diagonal matrix; the degree matrix with Di,i =∑

iWi,:

Let U = [u1, . . . , ud] be the eigenvectors of L

Lecture 10 CNNs on Graphs CMSC 35246

Page 58: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Graph Convolution in Frequency Domain

Define convolution of input signal x with filter g on G as:

x ∗G g = UT (Ux Ug)

Learning filters on a graph =⇒ learning spectral weights:

x ∗G g = UT (diag(wg)Ux)with wg = (w1, . . . , wd)

Lecture 10 CNNs on Graphs CMSC 35246

Page 59: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Graph Convolution in Frequency Domain

Define convolution of input signal x with filter g on G as:

x ∗G g = UT (Ux Ug)

Learning filters on a graph =⇒ learning spectral weights:

x ∗G g = UT (diag(wg)Ux)with wg = (w1, . . . , wd)

Lecture 10 CNNs on Graphs CMSC 35246

Page 60: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Local Filters

Notice that g has support over all vertices

But we want filters that are local

Observation: Smoothness in frequency domain =⇒ spatialdecay

Solution: Consider a smoothing kernel K ∈ Rd×d0 and searchfor multipliers:

wg = Kwg

Lecture 10 CNNs on Graphs CMSC 35246

Page 61: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Local Filters

Notice that g has support over all vertices

But we want filters that are local

Observation: Smoothness in frequency domain =⇒ spatialdecay

Solution: Consider a smoothing kernel K ∈ Rd×d0 and searchfor multipliers:

wg = Kwg

Lecture 10 CNNs on Graphs CMSC 35246

Page 62: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Local Filters

Notice that g has support over all vertices

But we want filters that are local

Observation: Smoothness in frequency domain =⇒ spatialdecay

Solution: Consider a smoothing kernel K ∈ Rd×d0 and searchfor multipliers:

wg = Kwg

Lecture 10 CNNs on Graphs CMSC 35246

Page 63: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Local Filters

Notice that g has support over all vertices

But we want filters that are local

Observation: Smoothness in frequency domain =⇒ spatialdecay

Solution: Consider a smoothing kernel K ∈ Rd×d0 and searchfor multipliers:

wg = Kwg

Lecture 10 CNNs on Graphs CMSC 35246

Page 64: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Graph Convolution Layer

Forward Pass:

• For input x, compute interpolated weights wf ′f = Kwf ′f

• Compute the output: ysf ′ = UT (∑

f Uxsf wf ′f )

Backward Pass:

• Compute gradient w.r.t input ∆xsf• Compute gradient w.r.t interpolated weights ∆wf ′f• Compute gradient w.r.t weight ∆wf ′f = KT∆wf ′f

Lecture 10 CNNs on Graphs CMSC 35246

Page 65: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Graph Convolution Layer

Forward Pass:

• For input x, compute interpolated weights wf ′f = Kwf ′f• Compute the output: ysf ′ = UT (

∑f Uxsf wf ′f )

Backward Pass:

• Compute gradient w.r.t input ∆xsf

• Compute gradient w.r.t interpolated weights ∆wf ′f• Compute gradient w.r.t weight ∆wf ′f = KT∆wf ′f

Lecture 10 CNNs on Graphs CMSC 35246

Page 66: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Graph Convolution Layer

Forward Pass:

• For input x, compute interpolated weights wf ′f = Kwf ′f• Compute the output: ysf ′ = UT (

∑f Uxsf wf ′f )

Backward Pass:

• Compute gradient w.r.t input ∆xsf• Compute gradient w.r.t interpolated weights ∆wf ′f

• Compute gradient w.r.t weight ∆wf ′f = KT∆wf ′f

Lecture 10 CNNs on Graphs CMSC 35246

Page 67: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Graph Convolution Layer

Forward Pass:

• For input x, compute interpolated weights wf ′f = Kwf ′f• Compute the output: ysf ′ = UT (

∑f Uxsf wf ′f )

Backward Pass:

• Compute gradient w.r.t input ∆xsf• Compute gradient w.r.t interpolated weights ∆wf ′f• Compute gradient w.r.t weight ∆wf ′f = KT∆wf ′f

Lecture 10 CNNs on Graphs CMSC 35246

Page 68: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

What if Graph Structure is unknown?

Estimate it from data:

Method 1: Unsupervised

• Given dataset X ∈ RN×d, compute distance d(i, j)between features:

d(i, j) = ‖Xi −Xj‖22

• Then compute Wi,j = exp−d(i,j)

σ2

Lecture 10 CNNs on Graphs CMSC 35246

Page 69: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

What if Graph Structure is unknown?

Estimate it from data:

Method 1: Unsupervised

• Given dataset X ∈ RN×d, compute distance d(i, j)between features:

d(i, j) = ‖Xi −Xj‖22

• Then compute Wi,j = exp−d(i,j)

σ2

Lecture 10 CNNs on Graphs CMSC 35246

Page 70: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

What if Graph Structure is unknown?

Estimate it from data:

Method 2: Supervised

• Given dataset X ∈ RN×d and labels y ∈ 1, . . . , CL,train a fully connected MLP with K layers, with weightsW1, . . . ,WK

• Pass data through network, extract K layer featuresWK ∈ RN×mk , then compute:

d(i, j) = ‖Wki −Wkj‖22

• Use Gaussian kernel as before to get Wi,j

Lecture 10 CNNs on Graphs CMSC 35246

Page 71: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

What if Graph Structure is unknown?

Estimate it from data:

Method 2: Supervised

• Given dataset X ∈ RN×d and labels y ∈ 1, . . . , CL,train a fully connected MLP with K layers, with weightsW1, . . . ,WK

• Pass data through network, extract K layer featuresWK ∈ RN×mk , then compute:

d(i, j) = ‖Wki −Wkj‖22

• Use Gaussian kernel as before to get Wi,j

Lecture 10 CNNs on Graphs CMSC 35246

Page 72: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

What if Graph Structure is unknown?

Estimate it from data:

Method 2: Supervised

• Given dataset X ∈ RN×d and labels y ∈ 1, . . . , CL,train a fully connected MLP with K layers, with weightsW1, . . . ,WK

• Pass data through network, extract K layer featuresWK ∈ RN×mk , then compute:

d(i, j) = ‖Wki −Wkj‖22

• Use Gaussian kernel as before to get Wi,j

Lecture 10 CNNs on Graphs CMSC 35246

Page 73: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Scenario 2

Learning Embeddings of Graphs

Lecture 10 CNNs on Graphs CMSC 35246

Page 74: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Example Task: Regression

Input: Organic Compounds (graphs)

Output: Boiling point

Lecture 10 CNNs on Graphs CMSC 35246

Page 75: Lecture 10 CNNs on Graphs - University of Chicagoshubhendu/Pages/Files/Lecture10_pauses.… · Lecture 10 CNNs on Graphs CMSC 35246. Data on Irregular Domains Often we can have structured

Graph Embedding: Simple Algorithm

Algorithm 1 Generation of embedding

Require: G = (V,E), radius δ, Hidden Weights: H11 , . . . ,H

δl , Out-

put Weights: W1, . . . ,Wδ

Initialize: Embedding φ← 0 Initialize: For every vertex rv ← Ψ(v)(local vertex features)

1: for all L = 1 to δ (for every layer) do2: for each vertex v in graph do3: r1, . . . , rN = neighbors(v)4: v ← rv +

∑Ni=1 ri

5: rv ← σ(vHNL )

6: i← softmax(rvWL)7: Update: φ← φ+ i8: end for9: end for

10: Output embedding φ

Lecture 10 CNNs on Graphs CMSC 35246