tutorial part 1 unsupervised learning - nips 2010 · pdf filetutorial part 1 unsupervised...

58
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto [email protected] Deep Learning and Unsupervised Feature Learning Workshop, 10 Dec. 2010 Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew Ng

Upload: ngodieu

Post on 06-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

TUTORIAL PART 1

Unsupervised Learning Marc'Aurelio Ranzato

Department of Computer Science – Univ. of [email protected]

Deep Learning and Unsupervised Feature Learning Workshop, 10 Dec. 2010

Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew Ng

Page 2: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature Learning

Input

Input space

Motorbikes “Non”-Motorbikes

Learningalgorithm

brightness

colo

r

kindly borrowed from Andrew Ng ECCV10

Page 3: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature Learning

Input

Input space

Motorbikes “Non”-Motorbikes

Learningalgorithm

brightness

colo

r

Input space

“handle”“w

hee l

FeatureExtractor

kindly borrowed from Andrew Ng ECCV10

Page 4: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

How is computer perception done?

Image Low-levelvision features

Recognition

Low-level statefeatures ActionHelicopter

Audio Low-levelaudio features

Speakeridentification

Object detection

Audio classification

Helicopter control

Page 5: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Computer vision features

SIFT Spin image

HoG RIFT

Textons GLOHkindly borrowed from Andrew Ng ECCV10

Page 6: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Audio features

ZCR

Spectrogram MFCC

RolloffFluxkindly borrowed from Andrew Ng ECCV10

Page 7: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Engineering features: Need expert knowledge Sub-optimal Time-consuming and expensive Does not generalize to other domain

Page 8: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

The goal of Unsupervised Feature Learning

Unlabeled images

Learningalgorithm

Feature representation

kindly borrowed from Andrew Ng ECCV10

Page 9: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Outline

What is Unsupervised Learning? Unsupervised Learning Algorithms Comparing Unsupervised Learning Algorithms (Tutorial II) Deep Learning

Page 10: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Unsupervised Learning

Data points belonging to 3 classes

Page 11: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Unsupervised Learning

No labels are provided during training

Page 12: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Unsupervised Learning

Fit mixture of 3 Gaussians: use responsibility to represent a data point (indicative of its class)

Page 13: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Unsupervised Learning Unsupervised Learning

Density estimation Latent variables ( features, possibly useful for →

discrimination)

Page 14: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Unsupervised Learning Unsupervised Learning

Density estimation Latent variables ( features, possibly useful for →

discrimination) Energy-based interpretation

Each data-point x has associated energy E(x) Training has to make E(x) lower for x in training set

E(x)

x

Page 15: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Unsupervised Learning Unsupervised Learning

Density estimation Latent variables ( features, possibly useful for →

discrimination) Energy-based interpretation

Each data-point x has associated energy E(x) Training has to make E(x) lower for x in training set

E(x)

x

BEFORE TRAINING

Page 16: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Unsupervised Learning Unsupervised Learning

Density estimation Latent variables ( features, possibly useful for →

discrimination) Energy-based interpretation

Each data-point x has associated energy E(x) Training has to make E(x) lower for x in training set

E(x)

x

AFTER TRAINING

Page 17: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Principal Component Analysis

Training: minimize s.t. orthogonality constraint E

E X ,Z ;W =∣∣X−W Z∣∣2

Z=W ' XFeature: , it must be lower dimensional

Input data points Reconstructions(1D feature space)

Page 18: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: , it must be lower dimensional

Principal Component Analysis

E X ,Z ;W =∣∣X−W Z∣∣2

Training: minimize s.t. orthogonality constraint

Z=W ' X

E

Input data points Reconstructions(1D feature space)

Page 19: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature:

Principal Component Analysis

E X ,Z ;W =∣∣X−W Z∣∣2

Z=W ' X

PROS CONS Simple training (tuning free) Unique solution Fast

Feature must be lower dimensional Features are linear

Page 20: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: , lower dimensional

Auto-encoder Neural Network

Training: minimize E

E X ,Z ;W =∣∣X− f W Z ∣∣2

Z=g A X

Input data points Reconstructions(1D feature space)

Page 21: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: , lower dimensional

Auto-encoder Neural Network

E X ,Z ;W =∣∣X− f W Z ∣∣2

Z=g A X

PROS CONS Non-linear features Pretty fast training

Feature must be lower dimensional A few hyper-parameters Optimization becomes hard if highly non-linear

Page 22: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: , n is noise

Denoising Auto-encoder

Training: minimize E

E X ,Z ;W =∣∣X− f W Z ∣∣2

Z=g A X n

Input data points Reconstructions(1D feature space)

Page 23: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature:

Denoising Auto-encoder

E X ,Z ;W =∣∣X− f W Z ∣∣2

Z=g A X n

PROS CONS Non-linear features Pretty fast training Robustness to noise in the input Possibly higher dimensional features Can check convergence

A few hyper-parameters Optimization becomes hard if highly non-linear Choice of noise distribution

Page 24: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: 1-of-N code

K-Means

E X ,Z ;W =∣∣X−W Z∣∣2

Training: minimize

Z∈

E

Input data points Reconstructions(2 prototypes)

Page 25: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: 1-of-N code

K-Means

E X ,Z ;W =∣∣X−W Z∣∣2

Z∈

PROS CONS Simple training (tuning free) Fast

One might need lots of prototypes to cover high-dimensional space Representation is too sparse

Page 26: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: sparse

Sparse Coding

E X ,Z ;W =∣∣X−W Z∣∣2∣Z∣1

Training: minimize (coordinate descent)

Z

E

Input data points Reconstructions(2 components)

Page 27: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: sparse

Sparse Coding

E X ,Z ;W =∣∣X−W Z∣∣2∣Z∣1

Z

PROS CONS Possibly higher-dimensional features It often yields more interpretable features Biologically plausible (?)

Expensive training Expensive inference Need to tune

Page 28: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Feature: sparse

Predictive Sparse Coding

E X ,Z ;W =∣∣X−W Z∣∣2∣Z∣1∣∣Z−g A ' X ∣∣

2

Z

PROS CONS Possibly higher-dimensional features Fast inference

Expensive training Need to tune

Page 29: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Restricted Boltzmann Machine

E X ,Z ;W =−Z 'W ' X

E=−w11 X 1Z 1−w12 X 1Z 2−w21 X 2 Z 1−...

X 1 X 2 X 3

Z 1 Z 2

W

All variables are binary: X i∈{0,1}, Z j∈{0,1}

Page 30: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X ,Z ;W =exp −E X , Z ;W

∑x '∑z 'exp −E x ' , z ' ;W

X 1 X 2 X 3

Z 1 Z 2

W

Restricted Boltzmann Machine

Page 31: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X ,Z ;W =exp −E X , Z ;W

∑x '∑z 'exp −E x ' , z ' ;W

X 1 X 2 X 3

Z 1 Z 2

W

INTRACTABLE

Restricted Boltzmann Machine

Page 32: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X =1∣Z ;W =∏ jW j Z ,

X 1 X 2 X 3

Z 1 Z 2

W

pZ=1∣X ;W =∏iW i ' X

Restricted Boltzmann Machine

u=1

1exp −u

Page 33: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X ∣Z ;W =WZ

X 1 X 2 X 3

Z 1 Z 2

W

pZ∣X ;W =W ' X

Easy conditionals: efficient Gibbs sampling

Restricted Boltzmann Machine

Page 34: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X ∣Z ;W =WZ

X 1 X 2 X 3

Z 1 Z 2

pZ∣X ;W =W ' X

Easy conditionals: efficient Gibbs sampling

pZ∣X ;W

Restricted Boltzmann Machine

Page 35: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X ∣Z ;W =WZ

Z 1 Z 2

pZ∣X ;W =W ' X

Easy conditionals: efficient Gibbs sampling

X 1 X 2 X 3

p X ∣Z ;W

Restricted Boltzmann Machine

Page 36: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X ∣Z ;W =WZ

pZ∣X ;W =W ' X

Easy conditionals: efficient Gibbs sampling

Z 1 Z 2

X 1 X 2 X 3

pZ∣X ;W ...

Restricted Boltzmann Machine

Page 37: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

p X ∣Z ;W =WZ

X 1 X 2 X 3

Z 1 Z 2

W

pZ∣X ;W =W ' X Subsequently used as features

Restricted Boltzmann Machine

Page 38: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

W W−∑zp z∣X t

∂ E X t , z ;W

∂W−∑x , z

p x , z ∂ E x , z ;W

∂W

E X ,Z ;W =−Z 'W ' X

Loss=−log p X ;W

Restricted Boltzmann Machine

Page 39: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

E(x)

x

BEFORE UPDATE

Restricted Boltzmann Machine

Loss=−log p X ;W

W W−∑zp z∣X t

∂ E X t , z ;W

∂W−∑x , z

p x , z ∂ E x , z ;W

∂W

Page 40: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

E(x)

x

AFTER UPDATE

Restricted Boltzmann Machine

Loss=−log p X ;W

W W−∑zp z∣X t

∂ E X t , z ;W

∂W−∑x , z

p x , z ∂ E x , z ;W

∂W

Page 41: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

Loss=−log p X ;W

E(x)

x

USING MCMC

Restricted Boltzmann Machine

W W−∑zp z∣X t

∂ E X t , z ;W

∂W−∑z

p z∣X m∂E X m , z ;W

∂W

Page 42: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

Loss=−log p X ;W

In practice, Gibbs sampler takes too long to converge: Contrastive Divergence Persistent Contrastive Divergence Fast Persistent Contrastive Divergence Score Matching Ratio Matching Margin-Based Losses Variational Methods

Restricted Boltzmann Machine

W W−∑zp z∣X t

∂ E X t , z ;W

∂W−∑z

p z∣X m∂E X m , z ;W

∂W

Page 43: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

E X ,Z ;W =−Z 'W ' X

Restricted Boltzmann Machine

PROS CONS Possibly higher-dimensional features It can generate data Simple interpretation of learning rule Simple to extend variables to other distributions

Exact learning is intractable Approximations do not let easily assess convergence A few hyper-parameters to tune

Z=g W ' X Feature:

Page 44: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Comparison

PCAAuto-encoderK-MeansSparse CodingRBMDenoising Auto-enc.

Linear Featuresyesnonononono

Page 45: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Comparison

PCAAuto-encoderK-MeansSparse CodingRBMDenoising Auto-enc.

Sparse featuresno (but it can be added)

no (but it can be added)

yesyesno (but it can be added)no (but it can be added)

Page 46: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Comparison

PCAAuto-encoderK-MeansSparse CodingRBMDenoising Auto-enc.

Energy “pull-up”restriction on coderestriction on coderestriction on coderestriction on codepartition functionnoise to input

Page 47: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Comparison

PCA on 8x8 patches ICA on 8x8 patches

Page 48: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

Comparing Unsupervised Algorithms

Properties of features Reconstruction error Likelihood on test data Discriminative performance of classifier trained on features Statistical dependency of components Denoising performance Other tasks

Page 49: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: RBM

Hinton “Training Product of Experts by minimizing contrastive divergence”, Neural Computation 2001 Welling, Rosen-Zvi, Hinton “Exponential Family Harmoniums with an Application to Information Retrieval”, NIPS 2005

code http://www.cs.toronto.edu/~hinton/code/rbm.m (matlab) http://www.cs.toronto.edu/~tijmen/gnumpy_example.py (python using gnumpy module to run on a GPU)

Page 50: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: sparse RBM

Lee, Ekanadham, Ng “Sparse deep belief net model for visual area V2” NIPS 2008 Lee, Grosse, Ranganath, Ng “Convolutional Deep Belief Networks for scalable unsupervised learning of hierarchical representations”, ICML 2009

Page 51: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: S-RBM

Osindero, Hinton “Modeling image patches with a directed hierarchy of markov random fields” NIPS 2008

Page 52: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: mcRBM

Ranzato, Hinton “Modeling pixel means and covariances using factorized third-order Boltzmann machines” CVPR 2010 Ranzato, Mnih, Hinton “Generating more realistic images using gated MRF” NIPS 2010

code http://www.cs.toronto.edu/~ranzato/publications/mcRBM/code/mcRBM_04May2010.zip 

(python code using CUDAMAT to run on a GPU)

Page 53: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: Sparse Coding Olshausen, Field “Sparse coding with an overcomplete basis set: a strategy employed by V1?” Vision Research 1997 Lee, Battle, Raina, Ng “Efficient sparse coding algorithms” NIPS 2007 Lee, Raina, Teichman, Ng “Exponential family sparse coding with applications to self-taught learning” IJCAI 2009

code http://www.eecs.umich.edu/~honglak/softwares/nips06-sparsecoding.htm http://www.di.ens.fr/willow/SPAMS/ (Julien Mairal's work on fast sparse coding methods with several extensions)

http://www.cs.technion.ac.il/~ronrubin/software.html(see also work by M. Elad et al. on K-SVD, another sparse coding algorithm)

Page 54: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: Predictive Sparse Coding

Kavukcuoglu, Ranzato, LeCun, "Fast inference in sparse coding algorithms with applications to object recognition," CBLL Technical Report, December 2008. ArXiv 1010.3467 Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun, "Learning Convolutional Feature Hierachies for Visual Recognition" NIPS 2010

code http://cs.nyu.edu/~koray/wp/?page_id=29(basic algorithm for predictive sparse decomposition)

Page 55: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: Local Coordinate Coding

Yu, Zhang, Gong “Nonlinear learning using local coordinate coding” NIPS 2009 Lin, Zhang, Zhu, Yu “Deep coding networks” NIPS 2010

Page 56: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: Product of Student's t

Osindero, Welling, Hinton "Topographic product models applied to natural scene statistics," Neural Computation 2006

Page 57: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

References: Denoising Auto-encoders

Pascal, Larochelle, Bengio, Manzagol “Extracting and composing robust features with denoising autoencoders” ICML 2008

code http://www.deeplearning.net/tutorial/dA.html#daa code written in Theano, a python library with interface to GPU, developed in Y. Bengio's lab, see more at: http://www.deeplearning.net/tutorial/

Page 58: TUTORIAL PART 1 Unsupervised Learning - NIPS 2010 · PDF fileTUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science – Univ. of Toronto ranzato@cs.toronto.edu

End of Part 1

Any questions?