01 artificial neuron - wordpress.com · 1/1/2018  · artificial neuron 2 topics: connection...

35
Neural networks Feedforward neural network - artificial neuron

Upload: others

Post on 20-May-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

Neural networksFeedforward neural network - artificial neuron

Page 2: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON2

Topics: connection weights, bias, activation function• Neuron pre-activation (or input activation):

• Neuron (output) activation

• are the connection weights • is the neuron bias • is called the activation function

...1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Page 3: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON3

Topics: connection weights, bias, activation function

2Reseaux de neurones

-1 1

-1

1

11

1 1

.5

-1.5

.7-.4-1

x1 x2

x1

x2

z=+1

z=-1

z=-1

0

1-1

0

1

-1

0

1

-1

0

1-1

0

1

-1

0

1

-1

0

1-1

0

1

-1

0

1

-1

R2

R2

R1

y1 y2

z

zk

wkj

wji

x1

x2

x1

x2

x1

x2

y1 y2

sortie k

entree i

cachee jbiais

(from Pascal Vincent’s slides)

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = biP

i wixi = bi +w

>x

• h(x) = g(a(x)) = g(biP

i wixi)

• w

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = biP

i wixi = bi +w

>x

• h(x) = g(a(x)) = g(biP

i wixi)

• w

• {

1

range determined by

bias only changes the position of the riff

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• w

• {

• g(·) b

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• w

• {

• g(·) b

1

Page 4: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

Neural networksFeedforward neural network - activation function

Page 5: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON5

Topics: connection weights, bias, activation function• Neuron pre-activation (or input activation):

• Neuron (output) activation

• are the connection weights • is the neuron bias • is called the activation function

...1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Page 6: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ACTIVATION FUNCTION6

Topics: linear activation function• Performs no input

squashing• Not very interesting...

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• sigm(a) = 1

1+exp(�a)

• tanh(a) = exp(a)�exp(�a)exp(a)+exp(�a) = exp(2a)�1

exp(2a)+1

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Page 7: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ACTIVATION FUNCTION7

Topics: sigmoid activation function• Squashes the neuron’s

pre-activation between 0 and 1

• Always positive• Bounded• Strictly increasing

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) = 1

1+exp(�a)

• g(a) = tanh(a) = exp(a)�exp(�a)exp(a)+exp(�a) = exp(2a)�1

exp(2a)+1

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Page 8: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ACTIVATION FUNCTION8

Topics: hyperbolic tangent (‘‘tanh’’) activation function• Squashes the neuron’s

pre-activation between -1 and 1

• Can be positive ornegative

• Bounded• Strictly increasing

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) = 1

1+exp(�a)

• g(a) = tanh(a) = exp(a)�exp(�a)exp(a)+exp(�a) = exp(2a)�1

exp(2a)+1

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Page 9: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ACTIVATION FUNCTION9

Topics: rectified linear activation function• Bounded below by 0

(always non-negative)• Not upper bounded• Strictly increasing• Tends to give neurons

with sparse activities

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Page 10: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ACTIVATION FUNCTION10

Topics: maxout activation function

• Not lower / upper bounded• Does not give neurons with sparse activities• But gradients are sparse

x

hi(x

)Rectifier

x

hi(x

)

Absolute value

x

hi(x

)

Quadratic

g(x) = maxj∈[1,k]

aj

aj = bj +wTj x

Page 11: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ACTIVATION FUNCTION11

Topics: modern activation functions

• Most models (approx. 95%) use ReLUs.• Exception: Not with Recurrent Neural Networks

• but see Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton (2015) - https://arxiv.org/pdf/1504.00941v2.pdf

Page 12: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

Neural networksFeedforward neural network - capacity of single neuron

Page 13: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON13

Topics: connection weights, bias, activation function

2Reseaux de neurones

-1 1

-1

1

11

1 1

.5

-1.5

.7-.4-1

x1 x2

x1

x2

z=+1

z=-1

z=-1

0

1-1

0

1

-1

0

1

-1

0

1-1

0

1

-1

0

1

-1

0

1-1

0

1

-1

0

1

-1

R2

R2

R1

y1 y2

z

zk

wkj

wji

x1

x2

x1

x2

x1

x2

y1 y2

sortie k

entree i

cachee jbiais

(from Pascal Vincent’s slides)

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = biP

i wixi = bi +w

>x

• h(x) = g(a(x)) = g(biP

i wixi)

• w

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = biP

i wixi = bi +w

>x

• h(x) = g(a(x)) = g(biP

i wixi)

• w

• {

1

range determined by

bias only changes the position of the riff

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• w

• {

• g(·) b

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• w

• {

• g(·) b

1

Page 14: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON14

Topics: capacity, decision boundary of neuron• Could do binary classification:‣ with sigmoid, can interpret neuron as estimating

‣ also known as logistic regression classifier

‣ if greater than 0.5,predict class 1

‣ otherwise, predictclass 0(similar idea can apply with tanh)

(from Pascal Vincent’s slides)

7Reseaux de neurones

• La puissance expressive des reseaux de neurones

x1 x2

x1

x2

...

x1 x2

R1

R2

R1

R2

R2

R1

x2

x1

deux couches

trois couches

decision boundary is linear

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

AbstractMath for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• p(y = 1|x)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o

⇣b

(2)

+w

(2)

>x

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

1

Page 15: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON15

Topics: capacity of single neuron• Can solve linearly separable problems 21

0 1

0 1 0 1 0 1

0

1

0

1

0

1

0

1

?

XOR (x1, x2)

OR (x1, x2) AND (x1, x2) AND (x1, x2)

(x1

(x1

,x

2)

0 1

0

1

XOR (x1, x2)

AND (x1, x2)

AN

D(x

1,x

2)

,x

2)

,x

2)

(x1 (x1

,x

2)

Figure 1.8 – Exemple de modélisation de XOR par un réseau à une couche cachée. Enhaut, de gauche à droite, illustration des fonctions booléennesOR(x1, x2),AND (x1, x2)et AND (x1, x2). En bas, on présente l’illustration de la fonction XOR(x1, x2) en fonc-tion des valeurs de x1 et x2 (à gauche), puis de AND (x1, x2) et AND (x1, x2) (à droite).Les points représentés par un cercle ou par un triangle appartiennent à la classe 0 ou1, respectivement. On observe que, bien qu’un classifieur linéaire soit en mesure de ré-soudre le problème de classification associé aux fonctions OR et AND, il ne l’est pasdans le cas du problème de XOR. Cependant, on utilisant les valeurs de AND (x1, x2)et AND (x1, x2) comme nouvelle représentation de l’entrée (x1, x2), le problème declassification XOR peut alors être résolu linéairement. À noter que dans ce derniercas, il n’existe que trois valeurs possibles de cette nouvelle représentation, puisqueAND (x1, x2) et AND (x1, x2) ne peuvent être toutes les deux vraies pour une mêmeentrée.

ont été entraînés dans le cadre des travaux de cette thèse contiennent quelques milliers

de neurones cachés. Ainsi, la tâche de l’algorithme d’apprentissage est de modifier lesparamètres du réseau afin de trouver la nature des caractéristiques de l’entrée que chaque

neurone doit extraire pour résoudre le problème de classification. Idéalement, ces carac-

Page 16: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON16

Topics: capacity of single neuron• Can’t solve non linearly separable problems...

• ... unless the input is transformed in a better representation

21

0 1

0 1 0 1 0 1

0

1

0

1

0

1

0

1

?

XOR (x1, x2)

OR (x1, x2) AND (x1, x2) AND (x1, x2)

(x1

(x1

,x

2)

0 1

0

1

XOR (x1, x2)

AND (x1, x2)

AN

D(x

1,x

2)

,x

2)

,x

2)

(x1 (x1

,x

2)

Figure 1.8 – Exemple de modélisation de XOR par un réseau à une couche cachée. Enhaut, de gauche à droite, illustration des fonctions booléennesOR(x1, x2),AND (x1, x2)et AND (x1, x2). En bas, on présente l’illustration de la fonction XOR(x1, x2) en fonc-tion des valeurs de x1 et x2 (à gauche), puis de AND (x1, x2) et AND (x1, x2) (à droite).Les points représentés par un cercle ou par un triangle appartiennent à la classe 0 ou1, respectivement. On observe que, bien qu’un classifieur linéaire soit en mesure de ré-soudre le problème de classification associé aux fonctions OR et AND, il ne l’est pasdans le cas du problème de XOR. Cependant, on utilisant les valeurs de AND (x1, x2)et AND (x1, x2) comme nouvelle représentation de l’entrée (x1, x2), le problème declassification XOR peut alors être résolu linéairement. À noter que dans ce derniercas, il n’existe que trois valeurs possibles de cette nouvelle représentation, puisqueAND (x1, x2) et AND (x1, x2) ne peuvent être toutes les deux vraies pour une mêmeentrée.

ont été entraînés dans le cadre des travaux de cette thèse contiennent quelques milliers

de neurones cachés. Ainsi, la tâche de l’algorithme d’apprentissage est de modifier lesparamètres du réseau afin de trouver la nature des caractéristiques de l’entrée que chaque

neurone doit extraire pour résoudre le problème de classification. Idéalement, ces carac-

Page 17: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

Neural networksFeedforward neural network - multilayer neural network

Page 18: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

NEURAL NETWORK18

Topics: single hidden layer neural network• Hidden layer pre-activation:

• Hidden layer activation:

• Output layer activation:

... 1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

......

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

...

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• f(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o(b

(2)

+w

(2)

>x)

1

output activation function

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• p(y = 1|x)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o

⇣b

(2)

+w

(2)

>x

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 7, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• p(y = 1|x)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o

⇣b

(2)

+w

(2)

>h

(1)

x

1

Page 19: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

NEURAL NETWORK19

Topics: softmax activation function• For multi-class classification:‣ we need multiple outputs (1 output per class)

‣ we would like to estimate the conditional probability

• We use the softmax activation function at the output:

‣ strictly positive

‣ sums to one

• Predicted class is the one with highest estimated probability

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

AbstractMath for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o

⇣b

(2)

+w

(2)

>x

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

AbstractMath for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o

⇣b

(2)

+w

(2)

>x

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

1

Page 20: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

NEURAL NETWORK20

Topics: multilayer neural network• Could have L hidden layers:‣ layer pre-activation for k>0

‣ hidden layer activation (k from 1 to L):

‣ output layer activation (k=L+1):...

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

...

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

1

1

...... 1

......

...

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

= x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

= x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

(x) (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

Page 21: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

Neural networksFeedforward neural network - capacity of neural network

Page 22: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

ARTIFICIAL NEURON22

Topics: capacity of single neuron• Can’t solve non linearly separable problems...

• ... unless the input is transformed in a better representation

21

0 1

0 1 0 1 0 1

0

1

0

1

0

1

0

1

?

XOR (x1, x2)

OR (x1, x2) AND (x1, x2) AND (x1, x2)

(x1

(x1

,x

2)

0 1

0

1

XOR (x1, x2)

AND (x1, x2)

AN

D(x

1,x

2)

,x

2)

,x

2)

(x1 (x1

,x

2)

Figure 1.8 – Exemple de modélisation de XOR par un réseau à une couche cachée. Enhaut, de gauche à droite, illustration des fonctions booléennesOR(x1, x2),AND (x1, x2)et AND (x1, x2). En bas, on présente l’illustration de la fonction XOR(x1, x2) en fonc-tion des valeurs de x1 et x2 (à gauche), puis de AND (x1, x2) et AND (x1, x2) (à droite).Les points représentés par un cercle ou par un triangle appartiennent à la classe 0 ou1, respectivement. On observe que, bien qu’un classifieur linéaire soit en mesure de ré-soudre le problème de classification associé aux fonctions OR et AND, il ne l’est pasdans le cas du problème de XOR. Cependant, on utilisant les valeurs de AND (x1, x2)et AND (x1, x2) comme nouvelle représentation de l’entrée (x1, x2), le problème declassification XOR peut alors être résolu linéairement. À noter que dans ce derniercas, il n’existe que trois valeurs possibles de cette nouvelle représentation, puisqueAND (x1, x2) et AND (x1, x2) ne peuvent être toutes les deux vraies pour une mêmeentrée.

ont été entraînés dans le cadre des travaux de cette thèse contiennent quelques milliers

de neurones cachés. Ainsi, la tâche de l’algorithme d’apprentissage est de modifier lesparamètres du réseau afin de trouver la nature des caractéristiques de l’entrée que chaque

neurone doit extraire pour résoudre le problème de classification. Idéalement, ces carac-

Page 23: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

NEURAL NETWORK23

Topics: single hidden layer neural network• Hidden layer pre-activation:

• Hidden layer activation:

• Output layer activation:

... 1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

......

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

...

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• f(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o(b

(2)

+w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o(b

(2)

+w

(2)

>x)

1

output activation function

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• p(y = 1|x)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o

⇣b

(2)

+w

(2)

>x

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 7, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• p(y = 1|x)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i w

(2)

i b

(2)

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i +

Pj W

(1)

i,j xj

• f(x) = o

⇣b

(2)

+w

(2)

>h

(1)

x

1

Page 24: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

CAPACITY OF NEURAL NETWORK24

Topics: single hidden layer neural network 2Reseaux de neurones

-1 1

-1

1

11

1 1

.5

-1.5

.7-.4-1

x1 x2

x1

x2

z=+1

z=-1

z=-1

0

1-1

0

1

-1

0

1

-1

0

1-1

0

1

-1

0

1

-1

0

1-1

0

1

-1

0

1

-1

R2

R2

R1

y1 y2

z

zk

wkj

wji

x1

x2

x1

x2

x1

x2

y1 y2

sortie k

entree i

cachee jbiais

(from Pascal Vincent’s slides)

Page 25: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

CAPACITY OF NEURAL NETWORK25

Topics: single hidden layer neural network

6Reseaux de neurones

• La puissance expressive des reseaux de neurones

y1

y2

y4

y3

y3 y4y2y1

x1 x2

z1

z1

x1

x2

(from Pascal Vincent’s slides)

Page 26: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

CAPACITY OF NEURAL NETWORK26

Topics: single hidden layer neural network

(from Pascal Vincent’s slides)

7Reseaux de neurones

• La puissance expressive des reseaux de neurones

x1 x2

x1

x2

...

x1 x2

R1

R2

R1

R2

R2

R1

x2

x1

deux couches

trois couches

Page 27: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

CAPACITY OF NEURAL NETWORK27

Topics: universal approximation• Universal approximation theorem (Hornik, 1991):‣ ‘‘a single hidden layer neural network with a linear output unit can approximate

any continuous function arbitrarily well, given enough hidden units’’

• The result applies for sigmoid, tanh and many other hidden layer activation functions

• This is a good result, but it doesn’t mean there is a learning algorithm that can find the necessary parameter values!

Page 28: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

Neural networksFeedforward neural network - biological inspiration

Page 29: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

NEURAL NETWORK29

Topics: multilayer neural network• Could have L hidden layers:‣ layer pre-activation for k>0

‣ hidden layer activation (k from 1 to L):

‣ output layer activation (k=L+1):...

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+P

i wixi = b+w

>x

• h(x) = g(a(x)) = g(b+P

i wixi)

• x

1

xd

• w

• {

• g(·) b

• h(x) = g(a(x))

• a(x) = b

(1) +W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)(b(2) +w

(2)

>x)

1

...

Feedforward neural network

Hugo LarochelleD

´

epartement d’informatique

Universit

´

e de Sherbrooke

[email protected]

September 6, 2012

Abstract

Math for my slides “Feedforward neural network”.

• a(x) = b+

Pi wixi = b+w

>x

• h(x) = g(a(x)) = g(b+

Pi wixi)

• x

1

xd b w

1

wd

• w

• {

• g(a) = a

• g(a) = sigm(a) =

1

1+exp(�a)

• g(a) = tanh(a) =

exp(a)�exp(�a)exp(a)+exp(�a) =

exp(2a)�1

exp(2a)+1

• g(a) = max(0, a)

• g(a) = reclin(a) = max(0, a)

• g(·) b

• W

(1)

i,j b

(1)

i xj h(x)i

• h(x) = g(a(x))

• a(x) = b

(1)

+W

(1)

x

⇣a(x)i = b

(1)

i

Pj W

(1)

i,j xj

• o(x) = g

(out)

(b

(2)

+w

(2)

>x)

1

1

1

...... 1

......

...

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

= x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

= x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

x (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

• p(y = c|x)

• o(a) = softmax(a) =

hexp(a1)Pc exp(ac)

. . .

exp(aC)Pc exp(ac)

i>

• f(x)

• h

(1)

(x) h

(2)

(x) W

(1)

W

(2)

W

(3)

b

(1)

b

(2)

b

(3)

• a

(k)(x) = b

(k)+W

(k)h

(k�1)

(x) (h

(0)

(x) = x)

• h

(k)(x) = g(a

(k)(x))

• h

(L+1)

(x) = o(a

(L+1)

(x)) = f(x)

2

Page 30: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

NEURAL NETWORK30

Le système visuel humain

●  Pourquoi%ne%pas%s’inspirer%du%cerveau%pour%faire%de%la%vision!%

IFT%615% Hugo%Larochelle% 41%

edges...

nose

moutheyes

face

Topics: parallel with the visual cortex

18

Page 31: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

BIOLOGICAL NEURONS31

Topics: synapse, axon, dendrite• We estimate around 1010 and 1011 as the number of neurons

in the human brain:‣ they receive information from other neurons through their dendrites

‣ they ‘‘process’’ the information in their cell body (soma)

‣ they send information through a ‘‘cable’’ called an axon

‣ the point of connection between the axon branches and other neurons’ dendrites are called synapses

Page 32: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

BIOLOGICAL NEURONS32

Topics: synapse, axon, dendrite52 3 Outline of the visual system

Axon

Computation

receptionSignal

transmissionSignal

Other neurons

Synapses

bodyCell

Dendrites

Fig. 3.1: A schematic diagram of information-processing in a neuron. Flow of information is fromright to left.

Fig. 3.2: Neurons (thick bodies, some lettered), each with one axon (thicker line going up) for send-ing out the signal, and many dendrites (thinner lines) for receiving signals. Drawing by SantiagoRamon y Cajal in 1900.

(from Hyvärinen, Hurri and Hoyer’s book)

Page 33: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

BIOLOGICAL NEURONS33

Topics: action potential, firing rate• An action potential is an electrical impulse that travels through the axon:‣ this is how neurons communicate

‣ it generates a ‘‘spike’’ in the electric potential (voltage) of the axon

‣ an action potential is generated at a neuron only if it receives enough (over some threshold) of the ‘‘right’’ pattern of spikes from other neurons

• Neurons can generate several such spikes every seconds:‣ the frequency of the spikes, called firing rate, is what characterizes the activity of a neuron

- neurons are always firing a little bit, (spontaneous firing rate), but they will fire more, given the right stimulus

Page 34: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

BIOLOGICAL NEURONS34

Topics: action potential, firing rate• Firing rates of different input neurons combine to influence the firing rate of other

neurons:‣ depending on the dendrite and axon, a neuron can either work to increase (excite) or descrease (inhibit) the

firing rate of another neuron

• This is what artificial neurons approximate:‣ the activation corresponds to a ‘‘sort of ’’ firing rate

‣ the weights between neurons model whether neurons excite or inhibit each other

‣ the activation function and bias model the thresholded behavior of action potentials

Page 35: 01 artificial neuron - WordPress.com · 1/1/2018  · ARTIFICIAL NEURON 2 Topics: connection weights, bias, activation function • Neuron pre-activation (or input activation): •

BIOLOGICAL NEURONS35

Hubel & Wiesel experiment

http://www.youtube.com/watch?v=8VdFf3egwfg&feature=related