unsupervised learning in computer vision · convolutional deep belief networks for scalable...

UNSUPERVISED LEARNING IN COMPUTER VISION

Jordan Campbell

Overview

¨  Convolutional deep belief networks ¤ CDBN

¨  Sparse coding.

Natural Images

¨  2-D matrix of intensity values. ¨  Exhibit certain statistical properties. ¨  Typically composed of edges, object parts and

objects.

¨  Can we train networks to learn representations of natural images?

Network Representation

¨  Sparse coding basis functions

learnt from 16x16 image

patches of natural scenes CDBN – first layer bases (edges - top box) and second layer bases (object parts – second box)

Sparse coding

¨  Mammalian V1 simple-cells are localized, oriented and bandpass.

¨  A sparse code will produce basis functions with these properties and provides a highly efficient representation of the original image (removes redundancy).

Sparse coding - Goal

¨  The aim was to find a set of basis functions that were:

n Sparse n Highly representative

¨  Natural images are highly non-Gaussian and not well described by orthogonal components (therefore PCA is not appropriate).

Sparse coding - algorithm

¨  Image is represented as:

¨  Goal of learning is to minimise the cost function:

Sparse coding – Equations

¨  Accuracy

¨  Sparseness

¨  Learning

Sparse coding - results

¨  192 basis functions were trained on 16x16 image patches of natural scenes.

¨  The resultant functions were localised, oriented and bandpass.

¨  Note that these are only the main spatial properties of simple-cell receptive fields, and that there are more. Similarly there are more complex cells elsewhere in the visual hierarchy.

CDBN - preliminaries

¨  RBM: ¤  two-layer undirected bipartite graph. ¤  Assume binary activations. ¤  Gibbs sampling for learning and inference

¨  Convolutional RBM: ¤  K groups of hidden and pooling layers. ¤  Weight sharing (translation invariance)

¨  Probabilistic max-pooling ¤  Shrinks the representation of the detection layer

¨  Sparsity regularization

CDBN - Architecture

CDBN - Training

CDBN – Training

¨  Contrastive divergence (Hinton, 1996) ¤ Calculate hidden activations given input ¤ Calculate visible state given hidden ¤ Calculate hidden again. This gives us <v h>(1)

¤  It would be better to have <v h> (infinity) (i.e. to maximise the log probability of the training data given the model parameters) however this is hard because we cannot estimate the energy function given such a complex model. (product of gaussians model – integration becomes intractable).

CDBN – Hierarchical Inference

¨  Once the parameters (weights) have been learned we can ask the network what it thinks about a given image.

¨  We present the image to the visible layer and then compute the network’s representation of the image by sampling from the joint distribution over all the hidden layers, using block Gibbs sampling.

CDBN – Results

¨  Tested performance on a two-layer network on the Caltech-101 object classification task.

¨  Achieved 65% test accuracy, which is comparable to state of the art results.

¨  0.8% test error on MNIST digit classification ¨  Hierarchical inference could be performed on

occluded objects when hidden layers could share information.

Speed!

¨  Both these algorithms require either lots of sampling (to arrive at distributions – CDBN) or complex optimisations.

¨  The CDBN can take weeks to learn. ¨  Sparse coding networks have been shown to exhibit

more realistic behaviour when the networks are much larger (end-stopping).

NVIDIA GPU

GPU - Architecture

Issues with parallelisation

¨  CDBNs and sparse coding algorithms rely on iterative, stochastic parameter updates that are dependent on previous updates. ¤  i.e. weight updates require a sample from the whole

representation.

¨  Memory transfers between RAM and the GPU’s global memory are slow.

Algorithms

¨  For sparse networks parallelisation is achieved by

noting that the joint distribution is not convex in both variables but it is convex if one variable is kept fixed while the other is optimised. ¤ The two variables are the basis functions and the

activations ¤ Algorithm optimises each activation value in parallel.

Results

¨  RBM with 1 million free parameters and 4096x11008 VxH units. ¤ 130x speedup.

¨  Sparse coding with 5000 examples and 1024 activations. ¤ 15x speedup.

References

¨  Hinton, G.E., Osindero, S. & Teh, Y. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation. 18:1527-1554

¨ 

¨  LeCun, Y., Kavukcuoglo, K. & Farabet, C. (2010). Convolutional Networks and Applications in Vision. Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS).

¨  ¨  Lee, H., Grosse, R., Ranganath, R. & Ng, A.Y. (2009). Convolutional Deep Belief Networks for Scalable

Unsupervised Learning of Hierarchical Representations. Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada.

¨ 

¨  Olhausen, B.A. & Field, D.J. (1996). Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images. Nature. 381:607-609

¨  ¨  Raina, R., Madhavan, A. & Ng, A.Y. (2009). Large-Scale Deep Unsupervised Learning Using Graphics

Processors. Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada. ¨ 

¨  Salakhutdino, R. & Murray, I. (2008). On the Quantitative Analysis of Deep Belief Nets. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.

unsupervised learning in computer vision · convolutional deep belief networks for scalable...

Documents