unsupervised learning in computer vision · convolutional deep belief networks for scalable...
TRANSCRIPT
UNSUPERVISED LEARNING IN COMPUTER VISION
Jordan Campbell
Overview
¨ Convolutional deep belief networks ¤ CDBN
¨ Sparse coding.
Natural Images
¨ 2-D matrix of intensity values. ¨ Exhibit certain statistical properties. ¨ Typically composed of edges, object parts and
objects.
¨ Can we train networks to learn representations of natural images?
Network Representation
¨ Sparse coding basis functions
learnt from 16x16 image
patches of natural scenes CDBN – first layer bases (edges - top box) and second layer bases (object parts – second box)
Sparse coding
¨ Mammalian V1 simple-cells are localized, oriented and bandpass.
¨ A sparse code will produce basis functions with these properties and provides a highly efficient representation of the original image (removes redundancy).
Sparse coding - Goal
¨ The aim was to find a set of basis functions that were:
n Sparse n Highly representative
¨ Natural images are highly non-Gaussian and not well described by orthogonal components (therefore PCA is not appropriate).
Sparse coding - algorithm
¨ Image is represented as:
¨ Goal of learning is to minimise the cost function:
Sparse coding – Equations
¨ Accuracy
¨ Sparseness
¨ Learning
Sparse coding - results
¨ 192 basis functions were trained on 16x16 image patches of natural scenes.
¨ The resultant functions were localised, oriented and bandpass.
¨ Note that these are only the main spatial properties of simple-cell receptive fields, and that there are more. Similarly there are more complex cells elsewhere in the visual hierarchy.
CDBN - preliminaries
¨ RBM: ¤ two-layer undirected bipartite graph. ¤ Assume binary activations. ¤ Gibbs sampling for learning and inference
¨ Convolutional RBM: ¤ K groups of hidden and pooling layers. ¤ Weight sharing (translation invariance)
¨ Probabilistic max-pooling ¤ Shrinks the representation of the detection layer
¨ Sparsity regularization
CDBN - Architecture
CDBN - Training
CDBN – Training
¨ Contrastive divergence (Hinton, 1996) ¤ Calculate hidden activations given input ¤ Calculate visible state given hidden ¤ Calculate hidden again. This gives us <v h>(1)
¤ It would be better to have <v h> (infinity) (i.e. to maximise the log probability of the training data given the model parameters) however this is hard because we cannot estimate the energy function given such a complex model. (product of gaussians model – integration becomes intractable).
CDBN – Hierarchical Inference
¨ Once the parameters (weights) have been learned we can ask the network what it thinks about a given image.
¨ We present the image to the visible layer and then compute the network’s representation of the image by sampling from the joint distribution over all the hidden layers, using block Gibbs sampling.
CDBN – Results
¨ Tested performance on a two-layer network on the Caltech-101 object classification task.
¨ Achieved 65% test accuracy, which is comparable to state of the art results.
¨ 0.8% test error on MNIST digit classification ¨ Hierarchical inference could be performed on
occluded objects when hidden layers could share information.
Speed!
¨ Both these algorithms require either lots of sampling (to arrive at distributions – CDBN) or complex optimisations.
¨ The CDBN can take weeks to learn. ¨ Sparse coding networks have been shown to exhibit
more realistic behaviour when the networks are much larger (end-stopping).
NVIDIA GPU
GPU - Architecture
Issues with parallelisation
¨ CDBNs and sparse coding algorithms rely on iterative, stochastic parameter updates that are dependent on previous updates. ¤ i.e. weight updates require a sample from the whole
representation.
¨ Memory transfers between RAM and the GPU’s global memory are slow.
Algorithms
¨ For sparse networks parallelisation is achieved by
noting that the joint distribution is not convex in both variables but it is convex if one variable is kept fixed while the other is optimised. ¤ The two variables are the basis functions and the
activations ¤ Algorithm optimises each activation value in parallel.
Results
¨ RBM with 1 million free parameters and 4096x11008 VxH units. ¤ 130x speedup.
¨ Sparse coding with 5000 examples and 1024 activations. ¤ 15x speedup.
References
¨ Hinton, G.E., Osindero, S. & Teh, Y. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation. 18:1527-1554
¨
¨ LeCun, Y., Kavukcuoglo, K. & Farabet, C. (2010). Convolutional Networks and Applications in Vision. Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS).
¨ ¨ Lee, H., Grosse, R., Ranganath, R. & Ng, A.Y. (2009). Convolutional Deep Belief Networks for Scalable
Unsupervised Learning of Hierarchical Representations. Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada.
¨
¨ Olhausen, B.A. & Field, D.J. (1996). Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images. Nature. 381:607-609
¨ ¨ Raina, R., Madhavan, A. & Ng, A.Y. (2009). Large-Scale Deep Unsupervised Learning Using Graphics
Processors. Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada. ¨
¨ Salakhutdino, R. & Murray, I. (2008). On the Quantitative Analysis of Deep Belief Nets. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.