convolutional networks - université de montréal · convolutional neural networks •object...

65
1 Convolutional Networks Honglak Lee CSE division, EECS department University of Michigan, Ann Arbor 8/6/2015 Deep Learning Summer School @ Montreal

Upload: others

Post on 21-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

1

Convolutional Networks

Honglak Lee

CSE division, EECS department

University of Michigan, Ann Arbor

8/6/2015

Deep Learning Summer School @ Montreal

Page 2: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

2

Unsupervised Convolutional Networks

Page 3: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

3

Natural Images Learned bases: “Edges”

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

~ + +

x ~ 1 * b36 + 1 * b42

+ 1 * b65

[0, 0, …, 0, 1, 0, …, 0, 1, 0, …, 0, 1, …] = coefficients (feature representation)

Test example

Motivation? Salient features, Compact representation

Compact & easily interpretable

Learning Feature Hierarchy [Lee et al., NIPS 2007; Ranzato et al., 2007]

Page 4: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

4

Input image (pixels)

“Sparse coding”

(edges)

Note: No explicit “pooling.”

[Related work: Bengio et al., 2006; Ranzato et al., 2007, and others.]

Lee et al., NIPS 2007: DBN (Hinton et al., 2006) with additional sparseness constraint.

Higher layer

(Combinations

of edges)

Describe more concretely..

Learning Feature Hierarchy

Page 5: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

5

• Learning objects and parts in images

• Large image patches contain interesting higher-level structures. – E.g., object parts and full objects

• Challenge: high-dimensionality and spatial correlations

Learning object representations

Page 6: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

6

Example image

“Filtering” output

“Shrink” (max over 2x2)

filter1 filter2 filter3 filter4

“Eye detector” Advantage of shrinking 1. Filter size is kept small 2. Invariance

Illustration: Learning an “eye” detector

Related work: Convnets by LeCun et al., 1989

Page 7: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

7

Weight sharing by “filtering” (convolution) [Lecun et al., 1989]

“Max-pooling” Invariance Computational efficiency

Convolutional Restricted Boltzmann machine. - Unsupervised - Probabilistic max-pooling - Can be stacked to form convolutional DBN

convolution filter

Detection layer

maximum 2x2 grid

Max-pooling layer

Detection layer

Max-pooling layer

Input

convolution

convolution

maximum 2x2 grid

max

conv

conv

max

Large filter Not computationally

efficient Too much information…

Filtering

Show examples before this figure

Convolutional architectures

Page 8: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

8

Wk

V (visible layer)

Detection layer H

Max-pooling layer P

Visible nodes (binary or real)

At most one hidden nodes are active.

Hidden nodes (binary)

“Filter“ weights (shared)

For “filter” k,

Constraint: At most one hidden node is 1 (active).

‘’max-pooling’’ node (binary)

Input data V

Key properties: - RBM (probabilistic model) - Convolutional structure (weight sharing) - Constraint for max-pooling (“mutual exclusion”)

Convolutional RBM (CRBM) [Lee et al., ICML 2009]

Page 9: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

9

Visible nodes (binary or real)

At most one hidden nodes are active.

Hidden nodes (binary)

‘’max-pooling’’ node (binary)

“Filter“ weights (shared)

For “filter” k,

At most one hidden nodes are active.

Max-pooling layer P

Detection layer H

Input data V

Convolutional RBM (CRBM) [Lee et al., ICML 2009]

Page 10: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

10

h3 h1 h2 h4

y

I1 I2 I3 I4

Pooling node

Detection nodes

Output of convolution W*V from below

Softmax function

Collapse 2n configurations into n+1 configurations.

1

1 0 0 0

1

0 1 0 0

1

0 0 1 0

1

0 0 0 1

0

0 0 0 0

Sample

Inference: probabilistic max-pooling

Page 11: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

11

• Bottom-up (greedy), layer-wise training

– Train one layer (convolutional RBM) at a time.

• Feedforward Inference (approximate)

Convolutional Deep Belief Networks (CDBN)

Page 12: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

12

W1

W2

W3

Input image

Layer 1

Layer 2

Layer 3

Example image

Layer 1 activation (coefficients)

Layer 2 activation (coefficients)

Layer 3 activation (coefficients)

Show only one figure

Filter visualization

Convolutional Deep Belief Networks (CDBN)

Page 13: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

13

First layer bases

Second layer bases

localized, oriented edges

contours, corners, arcs, surface boundaries

Unsupervised learning from natural images

Page 14: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

14

Faces Cars Elephants Chairs

Unsupervised learning of object-parts

Applications: • Classification (ICML 2009, NIPS 2009, ICCV 2011, ICML 2013) • Verification (CVPR 2012) • Image alignment (NIPS 2012)

Page 15: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

16

Convolutional Sparse Coding

• Learning objective

• Learned filters

Kavukcuoglu et al. Learning convolutional feature hierarchies for visual recognition. NIPS 2010.

First layer Second layer

Page 16: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

17

Deconvolutional Networks • Learning objective:

• Learned filters:

Zeiler et al. "Deconvolutional networks." CVPR 2010

Page 17: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

19

Supervised Convolutional Networks

Page 18: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

20

Example: Convolutional Neural Networks

• LeCun et al. 1989

• Neural network with specialized connectivity structure

Slide: R. Fergus

Page 19: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

21

Convolutional Neural Networks

• Feed-forward:

– Convolve input

– Non-linearity (rectified linear)

– Pooling (local max)

• Supervised

• Train convolutional filters by back-propagating classification error

Input Image

Convolution (Learned)

Non-linearity

Pooling

LeCun et al. 1998

Feature maps

Slide: R. Fergus

Page 20: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

22

Components of Each Layer

Pixels /

Features

Filter with

Dictionary (convolutional

or tiled)

Spatial/Feature

(Sum or Max)

Normalization

between

feature

responses

Output

Features

+ Non-linearity

[Optional]

Slide: R. Fergus

Page 21: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

23

Filtering

• Convolutional – Dependencies are local

– Translation equivariance

– Tied filter weights (few params)

– Stride 1,2,… (faster, less mem.)

Input Feature Map

.

.

.

Slide: R. Fergus

Page 22: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

24

Non-Linearity

• Non-linearity

– Per-element (independent)

– Tanh

– Sigmoid: 1/(1+exp(-x))

– Rectified linear

• Simplifies backprop

• Makes learning faster

• Avoids saturation issues

Preferred option

Slide: R. Fergus

Page 23: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

25

Pooling

• Spatial Pooling

– Non-overlapping / overlapping regions

– Sum or max

– Boureau et al. ICML’10 for theoretical analysis

Max

Sum

Slide: R. Fergus

Page 24: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

26

Normalization

• Contrast normalization (across feature maps)

– Local mean = 0, local std. = 1, “Local” 7x7 Gaussian

– Equalizes the features maps

Feature Maps Feature Maps

After Contrast Normalization

Slide: R. Fergus

Page 25: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

28

Applications

• Handwritten text/digits

– MNIST (0.17% error [Ciresan et al. 2011])

– Arabic & Chinese [Ciresan et al. 2012]

– Traffic sign recognition • 0.56% error vs 1.16% for humans [Ciresan et al. 2011]

Slide: R. Fergus

Page 26: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

29

Application: ImageNet

Validation classification

Validation classification

Validation classification

[Deng et al. CVPR 2009]

• ~14 million labeled images, 20k classes

• Images gathered from Internet

• Human labels via Amazon Turk

Page 27: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

30

Krizhevsky et al. [NIPS 2012]

• 7 hidden layers, 650,000 neurons, 60,000,000 parameters • Trained on 2 GPUs for a week

• Same model as LeCun’98 but: - Bigger model (8 layers)

- More data (106 vs 103 images) - GPU implementation (50x speedup over CPU) - Better regularization (DropOut)

Slide: R. Fergus

Page 28: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

31

ImageNet Classification 2012

• Krizhevsky et al. -- 16.4% error (top-5)

• Next best (non-convnet) – 26.2% error

0

5

10

15

20

25

30

35

SuperVision ISI Oxford INRIA Amsterdam

Top

-5 e

rro

r ra

te %

Slide: R. Fergus

Page 29: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

32

ImageNet Classification 2013 Results

• http://www.image-net.org/challenges/LSVRC/2013/results.php

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

Test

err

or

(to

p-5

)

Slide: R. Fergus

Page 30: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

33

ImageNet Classification 2014 Results

0

0.02

0.04

0.06

0.08

0.1

0.12

Classification error (ILSVRC 2014)

Page 31: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

34

Feature Generalization

• Visualization of features (via t-SNE embedding)

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, ICML 2014

Gist DeCAF1 DeCAF6

ILSVRC-2012 validation set

Page 32: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

35

Feature Generalization

• Domain adaptation task

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, ICML 2014

Page 33: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

36

Feature Generalization

• Caltech 101 classification

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, ICML 2014

Page 34: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

37

Feature Generalization

• Zeiler & Fergus, arXiv 1311.2901, 2013 (Caltech-101,256)

• Girshick et al. CVPR’14 (Caltech-101, SunS)

• Oquab et al. CVPR’14 (VOC 2012)

• Razavian et al. arXiv 1403.6382, 2014 (lots of datasets)

• Pre-train on Imagnet Retrain classifier on Caltech256

6 training examples

From Zeiler & Fergus, Visualizing and Understanding Convolutional Networks, arXiv 1311.2901, 2013 Slide credit: R. Fergus

CNN features

Page 35: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

38

Feature generalization over multiple tasks

• Generalization over multiple tasks

Ali Razavian, H. Azizpour, J. Sullivan, S. Carlsson, CNN Features off-the-shelf: an Astounding Baseline for Recognition, Arxiv 2014 P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. ICLR 2014

Page 36: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

41

Using very deep layers: VGG Network

• Main idea: use many small convolutions with deep layers

Simouyan et al., Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

Page 37: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

42

Going deeper: GoogLeNet

• Main idea: use multiple receptive fields + go deep

Szegedy et al. "Going deeper with convolutions." CVPR 2015

Page 38: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

43

Going deeper: GoogLeNet

• Main idea: use multiple receptive fields + go deep

Szegedy et al. "Going deeper with convolutions." CVPR 2015

Page 39: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

44

Experimental results on ILSVRC

Simouyan et al., Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015

Page 40: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

45

Other vision applications

Page 41: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

46

Object detection using multi-scale CNN

Kavukcuoglu et al. Learning convolutional feature hierarchies for visual recognition. NIPS 2010.

• Initialization for convolutional network

Page 42: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

47

Object detection using Convolutional Neural Networks

• Object detection systems based on the deep convolutional neural network (CNN) have recently made ground-breaking advances.

• The state-of-the-art: “Regions with CNN” (R-CNN)

Girshick et al, “Region-based Convolutional Networks for Accurate Object Detection and Semantic Segmentation”, PAMI, 2015.

Page 43: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

48

CNN Object detection with Bayesian optimization

Iterative procedure

IoU>0.5 IoU>0.7

Mean Average Precision Standard

localization

More accurate

localization

R-CNN (VGGNet) 65.4 35.2

Zhang et al., 2015 68.5 43.0

More accurate

localization

35.2

43.0

Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction. Zhang, Sohn, Villegas, Pan, and Lee, CVPR 2015

Page 44: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

49

CNN object detection with structured loss

• Linear classifier 𝑔 𝑥;𝑤 = argmax𝑦∈𝒴 𝑓(𝑥, 𝑦; 𝑤)

𝑓 𝑥, 𝑦; 𝑤 = 𝑤⊤𝜙 𝑥, 𝑦

𝜙 𝑥, 𝑦 = 𝜙 𝑥, 𝑦 , 𝑙 = +1𝟎, 𝑙 = −1

• Minimizing the structured loss (Blaschko and Lampert, 2008)*

𝑤 = argmax𝑤 Δ 𝑔 𝑥𝑖; 𝑤 , 𝑦𝑖

𝑀

𝑖=1

Δ(𝑦, 𝑦𝑖) = 1 − IoU 𝑦, 𝑦𝑖 , if 𝑙 = 𝑙𝑖 = 10, if 𝑙 = 𝑙𝑖 = −11, if 𝑙 ≠ 𝑙𝑖

* Blaschko and Lampert, “Learning to localize objects with structured output regression”, ECCV, 2008.

CNN features

The oracle detector gives zero loss.

Other related work: LeCun et al. 1989; Taskar et al. 2005; Joachims et al. 2005; Veldaldi et al. 2014; Thomson et al. 2014; and many others

Page 45: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

50

CNN object detection with structured loss

• The objective is hard to solve. Replace it with an upper-bound surrogate using structured SVM framework

min𝑤

1

2∥ 𝑤 ∥2 +

𝐶

𝑀

𝑀

𝑖=1

𝜉𝑖 , subject to

𝑤⊤𝜙 𝑥𝑖 , 𝑦𝑖 ≥ 𝑤⊤𝜙 𝑥𝑖 , 𝑦 + Δ 𝑦, 𝑦𝑖 − 𝜉𝑖 , ∀𝑦 ∈ 𝒴, ∀𝑖

𝜉𝑖 ≥ 0, ∀𝑖

• The constraints can be re-written as:

𝑤⊤𝜙(𝑥𝑖 , 𝑦𝑖) ≥ 1 − 𝜉𝑖 , ∀𝑖 ∈ 𝐼pos,

𝑤⊤𝜙 𝑥𝑖 , 𝑦 ≤ −1 + 𝜉𝑖 , ∀𝑦 ∈ 𝒴, ∀𝑖 ∈ 𝐼𝑛𝑒𝑔,

𝑤⊤𝜙 𝑥𝑖 , 𝑦𝑖 ≥ 𝑤𝛵𝜙 𝑥𝑖 , 𝑦 + Δloc 𝑦, 𝑦𝑖 − 𝜉𝑖 ,

∀𝑦 ∈ 𝒴, ∀𝑖 ∈ 𝐼pos,

where Δloc(𝑦, 𝑦𝑖) = 1 − IoU(𝑦, 𝑦𝑖).

Recognition

Localization

Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction. Zhang, Sohn, Villegas, Pan, and Lee, CVPR 2015

Page 46: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

51

Image segmentation and parsing

Farabet et al., Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML 2012

Page 47: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

52

Other Applications

• Tracking (Bazzani et. al. 2010, and many others)

• Pose estimation (Toshev et al. 2013, Jain et al., 2013, …)

• Caption generation (Vinyals et al. 2015, Xu et al. 2015, …)

Page 48: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

53

Industry Deployment

• Used in Facebook, Google, Microsoft

• Image Recognition, Speech Recognition, ….

• Fast at test time

Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14

Slide: R. Fergus

Page 49: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

54

Deep Visual-Semantic Embedding

Skipgram for word embedding

CNN

Frome et al., NIPS 2013

Visualization of label embedding

Page 50: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

56

Multiple output embeddings for zero-shot learning

Classification using compatibility function:

Combination of multiple output embeddings:

Akata, Reed, Walter, Lee, & Schiele. Evaluation of Output Embeddings for Fine-Grained Image Classification. CVPR 2015.

Page 51: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

57

Convolutional networks for other domains: speech

Page 52: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

58

hkj

ykα

Wk

vi

...

...

...

...

pooling layer

detection layer

visible layer hj

Wk

vi

...

...

...

...

max pooling layer

detection layer

visible layer

Convolutional RBM for time-series data

Page 53: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

59

Spectrogram

Detection nodes

Max pooling node

time

freq

uen

cy

Convolutional DBN for audio [NIPS 2009]

Page 54: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

60

Spectrogram time

freq

uen

cy

Convolutional DBN for audio [NIPS 2009]

Page 55: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

61

Convolutional DBN for audio [NIPS 2009]

Page 56: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

62

One CDBN layer Detection nodes

Max pooling

Detection nodes

Max pooling Second CDBN layer

Convolutional DBN for audio [NIPS 2009]

Page 57: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

63

Learned first-layer bases

Trained on unlabeled TIMIT corpus

CDBNs for speech

Page 58: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

64

Ph

on

eme

Firs

t la

yer

bas

es

“oy” “el” “s”

Comparison of bases to phonemes

Page 59: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

65

Phonemes First layer bases Second layer bases

Fem

ale

Mal

e

Comparison of bases to gender (“ae” phoneme)

Page 60: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

66

• Speaker identification

• Phoneme classification

• Gender classification

Use same set of learned features (computed from the same CDBN) for all three tasks.

Application to speech recognition tasks

Page 61: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

67

The CDBN features outperform the MFCC features especially when the number of training examples is small.

MFCC: Reynolds (1995)

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8

MFCC

CDBN

# training sentences per speaker

Cla

ssif

icat

ion

acc

ura

cy

* 168 speakers, 10 sentences/speaker.

Speaker Identification [NIPS 2009]

Page 62: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

68

77

78

79

80

81

Clarkson & Moreno (1999)

Gunawardana et al. (2005)

Sung et al. (2007)

Petrov et al. (2007)

Sha & Saul (2006)

Yu et al. (2009)

MFCC+CDBN

Our method

Test

Acc

ura

cy

Highlight is that we actually do better

* Tested on the (standard) TIMIT core test set.

Phoneme Classification [NIPS 2009]

Page 63: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

69

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

MFCC

CDBN layer1

CDBN layer2

The CDBN features outperform the MFCC features. The second layer CDBN features give better performance than the first layer CDBN features.

# training sentences per gender

Test

Acc

ura

cy

Gender Classification [NIPS 2009]

Page 64: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

70

Convolutional neural networks for speech recognition

Abdel-Hamid, O., Mohamed, A. R., Jiang, H., & Penn, G. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In ICASSP 2012.

Page 65: Convolutional Networks - Université de Montréal · Convolutional Neural Networks •Object detection systems based on the deep convolutional neural network (CNN) have recently made

71

Convoultional networks for music recommendation

Image from: http://benanne.github.io/2014/08/05/spotify-cnns.html Related work: Van den Oord, Dieleman & Schrauwen. Deep content-based music recommendation. In NIPS 2013