01 artificial neuron - wordpress.com · 1/1/2018 · artificial neuron 2 topics: connection...
TRANSCRIPT
Neural networksFeedforward neural network - artificial neuron
ARTIFICIAL NEURON2
Topics: connection weights, bias, activation function• Neuron pre-activation (or input activation):
• Neuron (output) activation
• are the connection weights • is the neuron bias • is called the activation function
...1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
ARTIFICIAL NEURON3
Topics: connection weights, bias, activation function
2Reseaux de neurones
-1 1
-1
1
11
1 1
.5
-1.5
.7-.4-1
x1 x2
x1
x2
z=+1
z=-1
z=-1
0
1-1
0
1
-1
0
1
-1
0
1-1
0
1
-1
0
1
-1
0
1-1
0
1
-1
0
1
-1
R2
R2
R1
y1 y2
z
zk
wkj
wji
x1
x2
x1
x2
x1
x2
y1 y2
sortie k
entree i
cachee jbiais
(from Pascal Vincent’s slides)
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = biP
i wixi = bi +w
>x
• h(x) = g(a(x)) = g(biP
i wixi)
• w
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = biP
i wixi = bi +w
>x
• h(x) = g(a(x)) = g(biP
i wixi)
• w
• {
1
range determined by
bias only changes the position of the riff
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• w
• {
• g(·) b
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• w
• {
• g(·) b
1
Neural networksFeedforward neural network - activation function
ARTIFICIAL NEURON5
Topics: connection weights, bias, activation function• Neuron pre-activation (or input activation):
• Neuron (output) activation
• are the connection weights • is the neuron bias • is called the activation function
...1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
ACTIVATION FUNCTION6
Topics: linear activation function• Performs no input
squashing• Not very interesting...
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• sigm(a) = 1
1+exp(�a)
• tanh(a) = exp(a)�exp(�a)exp(a)+exp(�a) = exp(2a)�1
exp(2a)+1
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
ACTIVATION FUNCTION7
Topics: sigmoid activation function• Squashes the neuron’s
pre-activation between 0 and 1
• Always positive• Bounded• Strictly increasing
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) = 1
1+exp(�a)
• g(a) = tanh(a) = exp(a)�exp(�a)exp(a)+exp(�a) = exp(2a)�1
exp(2a)+1
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
ACTIVATION FUNCTION8
Topics: hyperbolic tangent (‘‘tanh’’) activation function• Squashes the neuron’s
pre-activation between -1 and 1
• Can be positive ornegative
• Bounded• Strictly increasing
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) = 1
1+exp(�a)
• g(a) = tanh(a) = exp(a)�exp(�a)exp(a)+exp(�a) = exp(2a)�1
exp(2a)+1
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
ACTIVATION FUNCTION9
Topics: rectified linear activation function• Bounded below by 0
(always non-negative)• Not upper bounded• Strictly increasing• Tends to give neurons
with sparse activities
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
ACTIVATION FUNCTION10
Topics: maxout activation function
• Not lower / upper bounded• Does not give neurons with sparse activities• But gradients are sparse
x
hi(x
)Rectifier
x
hi(x
)
Absolute value
x
hi(x
)
Quadratic
g(x) = maxj∈[1,k]
aj
aj = bj +wTj x
ACTIVATION FUNCTION11
Topics: modern activation functions
• Most models (approx. 95%) use ReLUs.• Exception: Not with Recurrent Neural Networks
• but see Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton (2015) - https://arxiv.org/pdf/1504.00941v2.pdf
Neural networksFeedforward neural network - capacity of single neuron
ARTIFICIAL NEURON13
Topics: connection weights, bias, activation function
2Reseaux de neurones
-1 1
-1
1
11
1 1
.5
-1.5
.7-.4-1
x1 x2
x1
x2
z=+1
z=-1
z=-1
0
1-1
0
1
-1
0
1
-1
0
1-1
0
1
-1
0
1
-1
0
1-1
0
1
-1
0
1
-1
R2
R2
R1
y1 y2
z
zk
wkj
wji
x1
x2
x1
x2
x1
x2
y1 y2
sortie k
entree i
cachee jbiais
(from Pascal Vincent’s slides)
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = biP
i wixi = bi +w
>x
• h(x) = g(a(x)) = g(biP
i wixi)
• w
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = biP
i wixi = bi +w
>x
• h(x) = g(a(x)) = g(biP
i wixi)
• w
• {
1
range determined by
bias only changes the position of the riff
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• w
• {
• g(·) b
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• w
• {
• g(·) b
1
ARTIFICIAL NEURON14
Topics: capacity, decision boundary of neuron• Could do binary classification:‣ with sigmoid, can interpret neuron as estimating
‣ also known as logistic regression classifier
‣ if greater than 0.5,predict class 1
‣ otherwise, predictclass 0(similar idea can apply with tanh)
(from Pascal Vincent’s slides)
7Reseaux de neurones
• La puissance expressive des reseaux de neurones
x1 x2
x1
x2
...
x1 x2
R1
R2
R1
R2
R2
R1
x2
x1
deux couches
trois couches
decision boundary is linear
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
AbstractMath for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• p(y = 1|x)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o
⇣b
(2)
+w
(2)
>x
⌘
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
1
ARTIFICIAL NEURON15
Topics: capacity of single neuron• Can solve linearly separable problems 21
0 1
0 1 0 1 0 1
0
1
0
1
0
1
0
1
?
XOR (x1, x2)
OR (x1, x2) AND (x1, x2) AND (x1, x2)
(x1
(x1
,x
2)
0 1
0
1
XOR (x1, x2)
AND (x1, x2)
AN
D(x
1,x
2)
,x
2)
,x
2)
(x1 (x1
,x
2)
Figure 1.8 – Exemple de modélisation de XOR par un réseau à une couche cachée. Enhaut, de gauche à droite, illustration des fonctions booléennesOR(x1, x2),AND (x1, x2)et AND (x1, x2). En bas, on présente l’illustration de la fonction XOR(x1, x2) en fonc-tion des valeurs de x1 et x2 (à gauche), puis de AND (x1, x2) et AND (x1, x2) (à droite).Les points représentés par un cercle ou par un triangle appartiennent à la classe 0 ou1, respectivement. On observe que, bien qu’un classifieur linéaire soit en mesure de ré-soudre le problème de classification associé aux fonctions OR et AND, il ne l’est pasdans le cas du problème de XOR. Cependant, on utilisant les valeurs de AND (x1, x2)et AND (x1, x2) comme nouvelle représentation de l’entrée (x1, x2), le problème declassification XOR peut alors être résolu linéairement. À noter que dans ce derniercas, il n’existe que trois valeurs possibles de cette nouvelle représentation, puisqueAND (x1, x2) et AND (x1, x2) ne peuvent être toutes les deux vraies pour une mêmeentrée.
ont été entraînés dans le cadre des travaux de cette thèse contiennent quelques milliers
de neurones cachés. Ainsi, la tâche de l’algorithme d’apprentissage est de modifier lesparamètres du réseau afin de trouver la nature des caractéristiques de l’entrée que chaque
neurone doit extraire pour résoudre le problème de classification. Idéalement, ces carac-
ARTIFICIAL NEURON16
Topics: capacity of single neuron• Can’t solve non linearly separable problems...
• ... unless the input is transformed in a better representation
21
0 1
0 1 0 1 0 1
0
1
0
1
0
1
0
1
?
XOR (x1, x2)
OR (x1, x2) AND (x1, x2) AND (x1, x2)
(x1
(x1
,x
2)
0 1
0
1
XOR (x1, x2)
AND (x1, x2)
AN
D(x
1,x
2)
,x
2)
,x
2)
(x1 (x1
,x
2)
Figure 1.8 – Exemple de modélisation de XOR par un réseau à une couche cachée. Enhaut, de gauche à droite, illustration des fonctions booléennesOR(x1, x2),AND (x1, x2)et AND (x1, x2). En bas, on présente l’illustration de la fonction XOR(x1, x2) en fonc-tion des valeurs de x1 et x2 (à gauche), puis de AND (x1, x2) et AND (x1, x2) (à droite).Les points représentés par un cercle ou par un triangle appartiennent à la classe 0 ou1, respectivement. On observe que, bien qu’un classifieur linéaire soit en mesure de ré-soudre le problème de classification associé aux fonctions OR et AND, il ne l’est pasdans le cas du problème de XOR. Cependant, on utilisant les valeurs de AND (x1, x2)et AND (x1, x2) comme nouvelle représentation de l’entrée (x1, x2), le problème declassification XOR peut alors être résolu linéairement. À noter que dans ce derniercas, il n’existe que trois valeurs possibles de cette nouvelle représentation, puisqueAND (x1, x2) et AND (x1, x2) ne peuvent être toutes les deux vraies pour une mêmeentrée.
ont été entraînés dans le cadre des travaux de cette thèse contiennent quelques milliers
de neurones cachés. Ainsi, la tâche de l’algorithme d’apprentissage est de modifier lesparamètres du réseau afin de trouver la nature des caractéristiques de l’entrée que chaque
neurone doit extraire pour résoudre le problème de classification. Idéalement, ces carac-
Neural networksFeedforward neural network - multilayer neural network
NEURAL NETWORK18
Topics: single hidden layer neural network• Hidden layer pre-activation:
• Hidden layer activation:
• Output layer activation:
... 1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
......
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
...
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• f(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o(b
(2)
+w
(2)
>x)
1
output activation function
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• p(y = 1|x)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o
⇣b
(2)
+w
(2)
>x
⌘
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 7, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• p(y = 1|x)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o
⇣b
(2)
+w
(2)
>h
(1)
x
⌘
1
NEURAL NETWORK19
Topics: softmax activation function• For multi-class classification:‣ we need multiple outputs (1 output per class)
‣ we would like to estimate the conditional probability
• We use the softmax activation function at the output:
‣ strictly positive
‣ sums to one
• Predicted class is the one with highest estimated probability
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
AbstractMath for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o
⇣b
(2)
+w
(2)
>x
⌘
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
AbstractMath for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o
⇣b
(2)
+w
(2)
>x
⌘
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
1
NEURAL NETWORK20
Topics: multilayer neural network• Could have L hidden layers:‣ layer pre-activation for k>0
‣ hidden layer activation (k from 1 to L):
‣ output layer activation (k=L+1):...
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
...
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
1
1
...... 1
......
...
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
= x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
= x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
(x) (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
Neural networksFeedforward neural network - capacity of neural network
ARTIFICIAL NEURON22
Topics: capacity of single neuron• Can’t solve non linearly separable problems...
• ... unless the input is transformed in a better representation
21
0 1
0 1 0 1 0 1
0
1
0
1
0
1
0
1
?
XOR (x1, x2)
OR (x1, x2) AND (x1, x2) AND (x1, x2)
(x1
(x1
,x
2)
0 1
0
1
XOR (x1, x2)
AND (x1, x2)
AN
D(x
1,x
2)
,x
2)
,x
2)
(x1 (x1
,x
2)
Figure 1.8 – Exemple de modélisation de XOR par un réseau à une couche cachée. Enhaut, de gauche à droite, illustration des fonctions booléennesOR(x1, x2),AND (x1, x2)et AND (x1, x2). En bas, on présente l’illustration de la fonction XOR(x1, x2) en fonc-tion des valeurs de x1 et x2 (à gauche), puis de AND (x1, x2) et AND (x1, x2) (à droite).Les points représentés par un cercle ou par un triangle appartiennent à la classe 0 ou1, respectivement. On observe que, bien qu’un classifieur linéaire soit en mesure de ré-soudre le problème de classification associé aux fonctions OR et AND, il ne l’est pasdans le cas du problème de XOR. Cependant, on utilisant les valeurs de AND (x1, x2)et AND (x1, x2) comme nouvelle représentation de l’entrée (x1, x2), le problème declassification XOR peut alors être résolu linéairement. À noter que dans ce derniercas, il n’existe que trois valeurs possibles de cette nouvelle représentation, puisqueAND (x1, x2) et AND (x1, x2) ne peuvent être toutes les deux vraies pour une mêmeentrée.
ont été entraînés dans le cadre des travaux de cette thèse contiennent quelques milliers
de neurones cachés. Ainsi, la tâche de l’algorithme d’apprentissage est de modifier lesparamètres du réseau afin de trouver la nature des caractéristiques de l’entrée que chaque
neurone doit extraire pour résoudre le problème de classification. Idéalement, ces carac-
NEURAL NETWORK23
Topics: single hidden layer neural network• Hidden layer pre-activation:
• Hidden layer activation:
• Output layer activation:
... 1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
......
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
...
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• f(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o(b
(2)
+w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o(b
(2)
+w
(2)
>x)
1
output activation function
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• p(y = 1|x)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o
⇣b
(2)
+w
(2)
>x
⌘
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 7, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• p(y = 1|x)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i w
(2)
i b
(2)
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i +
Pj W
(1)
i,j xj
⌘
• f(x) = o
⇣b
(2)
+w
(2)
>h
(1)
x
⌘
1
CAPACITY OF NEURAL NETWORK24
Topics: single hidden layer neural network 2Reseaux de neurones
-1 1
-1
1
11
1 1
.5
-1.5
.7-.4-1
x1 x2
x1
x2
z=+1
z=-1
z=-1
0
1-1
0
1
-1
0
1
-1
0
1-1
0
1
-1
0
1
-1
0
1-1
0
1
-1
0
1
-1
R2
R2
R1
y1 y2
z
zk
wkj
wji
x1
x2
x1
x2
x1
x2
y1 y2
sortie k
entree i
cachee jbiais
(from Pascal Vincent’s slides)
CAPACITY OF NEURAL NETWORK25
Topics: single hidden layer neural network
6Reseaux de neurones
• La puissance expressive des reseaux de neurones
y1
y2
y4
y3
y3 y4y2y1
x1 x2
z1
z1
x1
x2
(from Pascal Vincent’s slides)
CAPACITY OF NEURAL NETWORK26
Topics: single hidden layer neural network
(from Pascal Vincent’s slides)
7Reseaux de neurones
• La puissance expressive des reseaux de neurones
x1 x2
x1
x2
...
x1 x2
R1
R2
R1
R2
R2
R1
x2
x1
deux couches
trois couches
CAPACITY OF NEURAL NETWORK27
Topics: universal approximation• Universal approximation theorem (Hornik, 1991):‣ ‘‘a single hidden layer neural network with a linear output unit can approximate
any continuous function arbitrarily well, given enough hidden units’’
• The result applies for sigmoid, tanh and many other hidden layer activation functions
• This is a good result, but it doesn’t mean there is a learning algorithm that can find the necessary parameter values!
Neural networksFeedforward neural network - biological inspiration
NEURAL NETWORK29
Topics: multilayer neural network• Could have L hidden layers:‣ layer pre-activation for k>0
‣ hidden layer activation (k from 1 to L):
‣ output layer activation (k=L+1):...
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+P
i wixi = b+w
>x
• h(x) = g(a(x)) = g(b+P
i wixi)
• x
1
xd
• w
• {
• g(·) b
• h(x) = g(a(x))
• a(x) = b
(1) +W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)(b(2) +w
(2)
>x)
1
...
Feedforward neural network
Hugo LarochelleD
´
epartement d’informatique
Universit
´
e de Sherbrooke
September 6, 2012
Abstract
Math for my slides “Feedforward neural network”.
• a(x) = b+
Pi wixi = b+w
>x
• h(x) = g(a(x)) = g(b+
Pi wixi)
• x
1
xd b w
1
wd
• w
• {
• g(a) = a
• g(a) = sigm(a) =
1
1+exp(�a)
• g(a) = tanh(a) =
exp(a)�exp(�a)exp(a)+exp(�a) =
exp(2a)�1
exp(2a)+1
• g(a) = max(0, a)
• g(a) = reclin(a) = max(0, a)
• g(·) b
• W
(1)
i,j b
(1)
i xj h(x)i
• h(x) = g(a(x))
• a(x) = b
(1)
+W
(1)
x
⇣a(x)i = b
(1)
i
Pj W
(1)
i,j xj
⌘
• o(x) = g
(out)
(b
(2)
+w
(2)
>x)
1
1
1
...... 1
......
...
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
= x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
= x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
x (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
• p(y = c|x)
• o(a) = softmax(a) =
hexp(a1)Pc exp(ac)
. . .
exp(aC)Pc exp(ac)
i>
• f(x)
• h
(1)
(x) h
(2)
(x) W
(1)
W
(2)
W
(3)
b
(1)
b
(2)
b
(3)
• a
(k)(x) = b
(k)+W
(k)h
(k�1)
(x) (h
(0)
(x) = x)
• h
(k)(x) = g(a
(k)(x))
• h
(L+1)
(x) = o(a
(L+1)
(x)) = f(x)
2
NEURAL NETWORK30
Le système visuel humain
● Pourquoi%ne%pas%s’inspirer%du%cerveau%pour%faire%de%la%vision!%
IFT%615% Hugo%Larochelle% 41%
edges...
nose
moutheyes
face
Topics: parallel with the visual cortex
18
BIOLOGICAL NEURONS31
Topics: synapse, axon, dendrite• We estimate around 1010 and 1011 as the number of neurons
in the human brain:‣ they receive information from other neurons through their dendrites
‣ they ‘‘process’’ the information in their cell body (soma)
‣ they send information through a ‘‘cable’’ called an axon
‣ the point of connection between the axon branches and other neurons’ dendrites are called synapses
BIOLOGICAL NEURONS32
Topics: synapse, axon, dendrite52 3 Outline of the visual system
Axon
Computation
receptionSignal
transmissionSignal
Other neurons
Synapses
bodyCell
Dendrites
Fig. 3.1: A schematic diagram of information-processing in a neuron. Flow of information is fromright to left.
Fig. 3.2: Neurons (thick bodies, some lettered), each with one axon (thicker line going up) for send-ing out the signal, and many dendrites (thinner lines) for receiving signals. Drawing by SantiagoRamon y Cajal in 1900.
(from Hyvärinen, Hurri and Hoyer’s book)
BIOLOGICAL NEURONS33
Topics: action potential, firing rate• An action potential is an electrical impulse that travels through the axon:‣ this is how neurons communicate
‣ it generates a ‘‘spike’’ in the electric potential (voltage) of the axon
‣ an action potential is generated at a neuron only if it receives enough (over some threshold) of the ‘‘right’’ pattern of spikes from other neurons
• Neurons can generate several such spikes every seconds:‣ the frequency of the spikes, called firing rate, is what characterizes the activity of a neuron
- neurons are always firing a little bit, (spontaneous firing rate), but they will fire more, given the right stimulus
BIOLOGICAL NEURONS34
Topics: action potential, firing rate• Firing rates of different input neurons combine to influence the firing rate of other
neurons:‣ depending on the dendrite and axon, a neuron can either work to increase (excite) or descrease (inhibit) the
firing rate of another neuron
• This is what artificial neurons approximate:‣ the activation corresponds to a ‘‘sort of ’’ firing rate
‣ the weights between neurons model whether neurons excite or inhibit each other
‣ the activation function and bias model the thresholded behavior of action potentials
BIOLOGICAL NEURONS35
Hubel & Wiesel experiment
http://www.youtube.com/watch?v=8VdFf3egwfg&feature=related