a brief review of non-neural-network approaches to deep learning

16
A brief review of non-neural- network approaches to deep learning Naiyan Wang

Upload: ratana

Post on 23-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

A brief review of non-neural-network approaches to deep learning. Naiyan Wang. Outline. Non-NN Approaches Deep Convex Net Extreme Learning Machine PCAnet Deep Fisher Net (Already presented before) Discussion. Deep convex net. Each module is a two- layer convex network. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A brief review of non-neural-network approaches to deep learning

A brief review of non-neural-network approaches to deep learning

Naiyan Wang

Page 2: A brief review of non-neural-network approaches to deep learning

Outline

• Non-NN Approaches– Deep Convex Net– Extreme Learning Machine– PCAnet– Deep Fisher Net (Already presented before)

• Discussion

Page 3: A brief review of non-neural-network approaches to deep learning

Deep convex net

• Each module is a two- layer convex network.

• After we get the prediction from each module, we concatenate it with the original input, and send it to a new module.

Page 4: A brief review of non-neural-network approaches to deep learning

Deep Convex Net

• For each module– We minimize– – U has a closed form solution:– Learning of W relies on gradient descent:

– Note that no global fine tuning involved, so it can stack up to more than 10 layers. (Fast Training!)

Page 5: A brief review of non-neural-network approaches to deep learning

Deep Convex Net

• A bit wired of why this works.• The learned features in mid-layers are NOT

representative for the input.• Maybe learn the correlation between

prediction and input could help?• Discussion?

Page 6: A brief review of non-neural-network approaches to deep learning

Deep Convex Net

Page 7: A brief review of non-neural-network approaches to deep learning

Deep Convex Net

Page 8: A brief review of non-neural-network approaches to deep learning

Extreme Learning Machine

• It is also a two layer networks:– The first layer performs random projection of

input data.– The second layer performs OLS/Ridge regression

to learn the weight.• After that, we could take the transpose of the

learned weight as the projection matrix, and stack several ELM into a deep one.

Page 9: A brief review of non-neural-network approaches to deep learning

Extreme Learning Machine

• Extremely fast learning• Note that even with simple random projection

and linear transformation, the results still can be improved!

Page 10: A brief review of non-neural-network approaches to deep learning

PCANet

• In the first two layers, use patch PCA to learn the filters.

• Then it binarizes the output in second layer, and calculate the histogram within a block.

Page 11: A brief review of non-neural-network approaches to deep learning

PCANet

• To learn the filters, the authors also proposed to use Random initialization and LDA.

• The results are acceptable in a wide range of datasets.

Page 12: A brief review of non-neural-network approaches to deep learning

Summary

• Most of the paper (except deep Fisher Net) report their results on relatively toy data. We cannot draw any conclusion about their performance.

• This could enlighten us some possible research directions.

Page 13: A brief review of non-neural-network approaches to deep learning

Discussion• Why deep architectures always help? (We don’t concern

about overfitting now)– The representation power increases exponentially as more

layers add in.– However the number of parameters increases linearly as more

layers add in.• Given a fixed budget, this is a better way to organize the

model.• Take PCA net as an example, if there are m, n neurons at

first and second layer, then there exists an equivalent m*n single layer net.

Page 14: A brief review of non-neural-network approaches to deep learning

Discussion• Why CNN is so successful in image classification?

– Data abstraction– Locality! (The image is a 2D structure with strong local

correlation.)• The convolution architecture could propagate local

information to a broader region– 1st: m * m, 2nd : n * n, then it corresponds to (m + n - 1) * (m + n

- 1) in the original image.• This advantage is further expanded by spatial pooling.• Other ways to concern about these two issues

simultaneously?

Page 15: A brief review of non-neural-network approaches to deep learning

Discussion

• Convolution is a dense architecture. It induces a lot of unnecessary computation.

• Could we come up a greedy or more clever selection in each layer to just focus on those discriminative patches?

• Or possibly a “convolutional cascade”?

Page 16: A brief review of non-neural-network approaches to deep learning

Discussion• Random weights are adopted several times, and it yields

acceptable results.• Pros:

– Data independent– Fast

• Cons:– Data independent

• So could we combine random weights and learned weights to combat against overfitting?

• Some work have been done on combining deterministic NN and stochastic NN.