su-a kim 12 th august 2014 convolutional neural networks convnet ● ○ ○ ○ ○ ○ ○ ○ ○...

Su-A Kim

12th August 2014

Convolutional Neural Networks

ConvNet● ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

Table of contents

Introduce Convolutional Neural Networks

Introduce application paper :“DeepFace: Closing the Gap to Human-Level Performance in Face Verification”, CVPR 2014

ConvNet● ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

Su-A Kim12th August 2014 @CVLAB

History

Yann LeCun

In 1995, Yann LeCun and Yoshua Bengio introduced the concept of convolutional neural networks.

Yoshua Bengio

ConvNet● ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

Recap of Convnet

Neural network with specialized connectivity structure

Feed-forward:- Convolve input- Non-linearity (rectified linear)- Pooling (local max)

Supervised

Train convolutional filters byback-propagating classification error

Feature maps

Pooling

Non-linearity

Convolution(Learned)

Input image

Slide: R.fergusSu-A Kim

12th August 2014 @CVLAB

ConvNet○ ● ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

Connectivity & weight sharing depends on layer

All different weights

Convolution layer has much smaller number of parametersby local connection and weight sharing

All different weights Shared weights


ConvNet○ ○ ● ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

features

Convolution layer

Detect the same feature at different positionsin the input image

Filter(kernel)

Input

Feature mapSlide: R.fergus


ConvNet○ ○ ○ ● ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

Non-linearity

Tanh

Sigmoid: 1/(1+exp(-x))

Rectified linear (ReLU) : max(0,x)- Simplifies backprop- Makes learning faster- Make feature sparse

→ Preferred option



ConvNet○ ○ ○ ○ ● ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

Sub-sampling layer

Spatial Pooling- Average or Max- Boureau et al. ICML’10 for theoretical analysis → Max 가 더 좋다는 연구

Role of Pooling- Invariance to small transformations- reduce the effect of noises and shift or distortion

Slide: R.fergus

Max

Sum


ConvNet○ ○ ○ ○ ○ ● ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

Normalization

Contrast normalization (between/across feature map)- Equalizes the features map → Detail 하지 않은 feature 를 잡아냄

Feature maps Feature mapsafter contrast normalization



ConvNet○ ○ ○ ○ ○ ○ ● ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

LeNet 5

C1,C3,C5 : Convolutional layer. (5 × 5 Convolution matrix.) S2 , S4 : Subsampling layer. (by factor 2) F6 : Fully connected layer.

About 187,000 connection. About 14,000 trainable weight.


ConvNet○ ○ ○ ○ ○ ○ ○ ● ○ ○

DeepFace○ ○ ○ ○ ○ ○ ○

LeNet 5

노이즈에도 강건


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ● ○

DeepFace○ ○ ○ ○ ○ ○ ○

About CNN’s

A special kind of multi-layer neural networks.

Implicitly extract relevant features.

A feed-forward network that can extract topological properties from an image.

Like almost every other neural networks CNNs are trained with a version ofthe back-propagation algorithm.


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ●

DeepFace○ ○ ○ ○ ○ ○ ○

Yaniv Taigman, Ming Yang, Marc’ Aurelio Ranzato, Lior WolfFacebook AI Research, Tel Aviv University

DeepFace: Closing the Gap to Human-Level Performancein Face Verification

Reach an accuracy of 97.35%


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace● ○ ○ ○ ○ ○ ○

Architecture

Face Alignment

Representation(CNN)


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace● ○ ○ ○ ○ ○ ○

Face Alignment

(1) 2D alignment

(2) 3D alignment

얼굴 영역 검출 후 , 기준점 6 개 추출 기준점 추출 : LBP histogram 을 descriptor 로 사용해서

미리 학습된 SVR(Support Vector Regressor) 로 추출

67 개 landmark Landmarkmapping

2D-3D align Frontalization 2D projection


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ● ○ ○ ○ ○ ○

Representation

C1-M2-C3

Low-level feature 추출(simple edges and texture)

Apply max-pooling only to the first convolution layer, why?

Input152x152


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ● ○ ○ ○ ○

Representation

L4-L5-L6(Locally connected)

152x152

Locally connected layer 를 사용한 이유 ?: 각각의 영역들은 서로 다른 localstatistic 을 가짐



Shared weights


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ● ○ ○ ○



C1-M2-C3

Representation

L4-L5-L6(Locally connected)

F7-F8(Fully connected)



152x152

얼굴에서 떨어져 있는 부분에서뽑힌 feature 사이의 correlation 을구할 수 있음

Output of F7 : raw face representation feature vector

Output of F8 :Class labels 의 확률분포를 구하는데 사용됨

Locally connected layer 를 사용한 이유 ?


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ● ○ ○

C1-M2-C3

Training

Correct class 의 확률을 최대화 하는 것이 목적

Back-propagation 해서 파라미터를 최소화하고 , stochastic gradient descent(SGD) 를 사용해서 파라미터를 업데이트


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ● ○

Result

Reduces the error of the previous best methods by more than 50%

Youtube 에 100 개정도 잘못 라벨링 된 것들이 있어서그것까지 치면 92.5% 정도 됨


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ●

Reference

[1] Bouchain, David. "Character recognition using convolutional neural networks.“ Institute for Neural Information Processing 2007 (2006).

[2] Bouvrie, Jake. "Notes on convolutional neural networks." (2006).

[3] Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. "Deep sparse rectifier networks." Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume. Vol. 15. 2011.

[4] Ahonen, Timo, Abdenour Hadid, and Matti Pietikainen. "Face description with local binary patterns: Application to face recognition." Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.12 (2006): 2037-2041.

[5] Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends® in Machine Learning 2.1 (2009): 1-127.


ConvNet○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DeepFace○ ○ ○ ○ ○ ○ ●

su-a kim 12 th august 2014 convolutional neural networks convnet ● ○ ○ ○ ○ ○ ○ ○ ○...

Documents

cvlab convnet deepface

distortion slide

input image slide

fergus su

contrast normalization

convolutional layer

fergus max sum su

features convolution