artificial neural networksartificial neural networks x1 x2 x3 what are they? inspired by the human...

Artificial Neural Networks

What are they?Inspired by the Human Brain. The human brain has about 86 Billion neurons and requires 20% of your body’s energy to function. These neurons are connected to between 100 Trillion to 1 Quadrillion synapses!

What are they?Human Machine

Input Output

What are they?1. Originally developed by Warren McCullough and Walter Pitts (1944) 2. First trainable network, perceptron, proposed by Frank Rosenblatt

(1957) 3. Started off as an unsupervised learning tool.

1. Had problems with computing time and could not compute XOR 2. Was abandoned in favor of other algorithms

4. Werbos's (1975) backpropagation algorithm 1. Incorporated supervision and solved XOR

2. But were still too slow vs. other algorithms e.g., Support Vector Machines

5. Backpropagation was accelerated by GPUs in 2010 and shown to be more efficient and cost effective

GPUSGPUS handle parallel operations much better (thousands of threads per core) but are not as quick as CPUs. However, the matrix multiplication steps in ANNs can be run in parallel resulting in considerable time + cost savings. The best CPUs handle about 50GB/s while the best GPUs handle 750GB/s memory bandwidth.

CPU i9 Xseries

GeForce GTX 1080

Cores 18 (36 threads) 2560

Clock Speed (GHz)

4.4 1.6G

Memory Shared 8GB

Price ($) 1799 549

What are they?Human Machine

Input Output

Neuron

w0 = − 10

w1 = 20

w2 = 20

y = g(w0x0 + w1x1 + w2x2)

g − activation function

Neuron

Inputs

z = x0w0 + x1w1 + x2w2 =2

∑i=0

wixi y = g(z)

w0 = − 10

w1 = 20

w2 = 20

y = g(w0x0 + w1x1 + w2x2)

Activation function

z = x0w0 + x1w1 + x2w2 =2

∑i=0

wi xi y = g (z )

w0 = − 10

w1 = 20

w2 = 20

y = g (w0x0 + w1x1 + w2x2)

g(z) =1

1 + e−z

z = w0x0 + w1x1 + w2x2

Activation Functions■ Sigmoid

■ ReLU

■ Hyperbolic tangent

■ Leaky ReLU

g(z) =1

1 + e−z

tanh(z) =2

1 + e−2z− 1

r(z) = max(0,z) lr(z) = {αz for z < 0z for z ≥ 0

Neural Network

Input layer Hidden layers Output layer

Output

XOR Gate

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

output = h2 = z

Feedforward propagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

z0 = wox + w2y= 0.4 × 1 + 0.3 × 1= 0.7

activation function, g(z) =1

1 + e−z

z1 = w1x + w3y= 0.6 × 1 + 0.8 × 1= 1.4

h0 = g(z0) =1

1 + e−z0

1 + e−0.7

h0 = 0.67

h1 = g(z1) =1

1 + e−z1

1 + e−1.4

h1 = 0.8

z2 = w4h0 + w5h1

= 0.7 × 0.67 + 0.1 × 0.8= 0.63

h2 = g(z2) =1

1 + e−z2

1 + e−0.63

h2 = 0.65

h2 = z = 0.65

Feedforward propagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

z0 = wox + w2y= 0.4 × 1 + 0.3 × 1= 0.7

z1 = w1x + w3y= 0.6 × 1 + 0.8 × 1= 1.4

h0 = g (z0) =1

1 + e−z0

1 + e−0.7h0 = 0.67

h1 = g (z1) =1

1 + e−z1

1 + e−1.4h1 = 0.8

z2 = w4h0 + w5h1= 0.7 × 0.67 + 0.1 × 0.8= 0.63

h2 = g (z2) =1

1 + e−z2

1 + e−0.63h2 = 0.65

; h2 = z = 0.65

Mean Squared Error, E =12

( z − za)2 =12

(0.65 − 0)2 = 0.21

Backpropagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

( z − za)2 =12

(h2 − za)2 =12

(0.65 − 0)2 = 0.21

( z − za)2

∂E∂w4

=∂E∂h2

⋅∂h2

∂z2⋅

∂w4= 0.65 × 0.228 × 0.67 = 0.09

∂E∂h2

∂h2 [ 12

(h2 − za)2] = h2 − za = 0.65

∂z2=

∂∂z2

g(z2) =∂

∂z2 ( 11 + e−z2 ) = g(z2)(1 − g(z2)) = h2(1 − h2) = 0.65 × 0.35 = 0.228

g′�(z) =∂∂z

g(z) =∂∂z ( 1

1 + e−z ) = g(z)[1 − g(z)]

∂w4=

∂∂w4

[w4h0 + w5h1] = h0 = 0.67

Backpropagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

( z − za)2 =12

(h2 − za)2 =12

(0.65 − 0)2 = 0.21

( z − za)2

∂E∂w5

=∂E∂h2

⋅∂h2

∂z2⋅

∂w5= 0.65 × 0.228 × 0.8 = 0.118

∂E∂h2

∂h2 [ 12

(h2 − za)2] = h2 − za = 0.65

∂z2=

∂∂z2

g(z2) =∂

∂z2 ( 11 + e−z2 ) = g(z2)(1 − g(z2)) = h2(1 − h2) = 0.65 × 0.35 = 0.228

g′�(z) =∂∂z

g(z) =∂∂z ( 1

1 + e−z ) = g(z)[1 − g(z)]

∂w5=

∂∂w5

[w4h0 + w5h1] = h1 = 0.8

Backpropagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

( z − za)2 =12

(h2 − za)2 =12

(0.65 − 0)2 = 0.21

( z − za)2

∂E∂w0

=∂E∂h0

⋅∂h0

∂z0⋅

∂w0= 0.103 × 0.221 × 1 = 0.227

∂E∂h0

=∂E∂h2

⋅∂h2

∂z2⋅

∂h0= h2 × h2(1 − h2) × w4 = 0.65 × 0.65(1 − 0.65) × 0.7 = 0.103

∂z0=

∂∂z0

g(z0) =∂

∂z0 ( 11 + e−z0 ) = g(z0)(1 − g(z0)) = h0(1 − h0) = 0.67 × 0.33 = 0.221

g′�(z) =∂∂z

g(z) =∂∂z ( 1

1 + e−z ) = g(z)[1 − g(z)]

∂w0=

∂∂w0

[w0x + w2y] = x = 1

Backpropagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

( z − za)2 =12

(h2 − za)2 =12

(0.65 − 0)2 = 0.21

( z − za)2

∂E∂w2

=∂E∂h0

⋅∂h0

∂z0⋅

∂w2= 0.103 × 0.221 × 1 = 0.227

∂E∂h0

=∂E∂h2

⋅∂h2

∂z2⋅

∂h0= h2 × h2(1 − h2) × w4 = 0.65 × 0.65(1 − 0.65) × 0.7 = 0.103

∂z0=

∂∂z0

g(z0) =∂

∂z0 ( 11 + e−z0 ) = g(z0)(1 − g(z0)) = h0(1 − h0) = 0.67 × 0.33 = 0.221

g′�(z) =∂∂z

g(z) =∂∂z ( 1

1 + e−z ) = g(z)[1 − g(z)]

∂w2=

∂∂w2

[w0x + w2y] = y = 1

Backpropagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

( z − za)2 =12

(h2 − za)2 =12

(0.65 − 0)2 = 0.21

( z − za)2

∂E∂w1

=∂E∂h1

⋅∂h1

∂z1⋅

∂w1= 0.014 × 0.16 × 1 = 0.0022

∂E∂h1

=∂E∂h2

⋅∂h2

∂z2⋅

∂h1= h2 × h2(1 − h2) × w5 = 0.65 × 0.65(1 − 0.65) × 0.1 = 0.014

∂z1=

∂∂z1

g(z1) =∂

∂z1 ( 11 + e−z1 ) = g(z1)(1 − g(z1)) = h1(1 − h1) = 0.8 × 0.2 = 0.16

g′�(z) =∂∂z

g(z) =∂∂z ( 1

1 + e−z ) = g(z)[1 − g(z)]

∂w1=

∂∂w1

[w1x + w3y] = x = 1

Backpropagation

0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

( z − za)2 =12

(h2 − za)2 =12

(0.65 − 0)2 = 0.21

( z − za)2

∂E∂w3

=∂E∂h1

⋅∂h1

∂z1⋅

∂w3= 0.014 × 0.16 × 1 = 0.0022

∂E∂h1

=∂E∂h2

⋅∂h2

∂z2⋅

∂h1= h2 × h2(1 − h2) × w5 = 0.65 × 0.65(1 − 0.65) × 0.1 = 0.014

∂z1=

∂∂z1

g(z1) =∂

∂z1 ( 11 + e−z1 ) = g(z1)(1 − g(z1)) = h1(1 − h1) = 0.8 × 0.2 = 0.16

g′�(z) =∂∂z

g(z) =∂∂z ( 1

1 + e−z ) = g(z)[1 − g(z)]

∂w1=

∂∂w3

[w1x + w3y] = y = 1

Gradient descent weight updates0 0 00 1 1

w0 = 0.4

w1 = 0.6

w2 = 0.3

w3 = 0.8w5 = 0.1

w4 = 0.7

out put = h2 = z

1 + e−z

( z − za)2 =12

(h2 − za)2 =12

(0.65 − 0)2 = 0.21

( z − za)2

g′�(z) =∂∂z

g(z) =∂∂z ( 1

1 + e−z ) = g(z)[1 − g(z)]

z0 = wox + w2y= 0.4 × 1 + 0.3 × 1= 0.7

z1 = w1x + w3y= 0.6 × 1 + 0.8 × 1= 1.4

h0 = g (z0) =1

1 + e−z0

1 + e−0.7h0 = 0.67

h1 = g (z1) =1

1 + e−z1

1 + e−1.4h1 = 0.8

z2 = w4h0 + w5h1= 0.7 × 0.67 + 0.1 × 0.8= 0.63

h2 = g (z2) =1

1 + e−z2

1 + e−0.63h2 = 0.65

h2 = z = 0.65

∂E∂ w4

=∂E∂h2

⋅∂h2∂z2

⋅∂z2∂ w4

= 0.65 × 0.228 × 0.67 = 0.09

∂E∂ w5

=∂E∂h2

⋅∂h2∂z2

⋅∂z2∂ w5

= 0.65 × 0.228 × 0.8 = 0.118

∂E∂ w0

=∂E∂h0

⋅∂h0∂z0

⋅∂z0∂ w0

= 0.103 × 0.221 × 1 = 0.227

∂E∂w2

=∂E∂h0

⋅∂h0∂z0

⋅∂z0∂w2

= 0.103 × 0.221 × 1 = 0.227

∂E∂w1

=∂E∂h1

⋅∂h1∂z1

⋅∂z1∂w1

= 0.014 × 0.16 × 1 = 0.0022

∂E∂w3

=∂E∂h1

⋅∂h1∂z1

⋅∂z1∂w3

= 0.014 × 0.16 × 1 = 0.0022

Learning rate, α = 0.01

w4 = w4 − α∂E∂w4

; w5 = w5 − α∂E∂w5

; w0 = w0 − α∂E∂w0

; w1 = w1 − α∂E∂w1

; w2 = w2 − α∂E∂w2

; w3 = w3 − α∂E∂w3

feed for ward

backprop

grad descentweight update

artificial neural networksartificial neural networks x1 x2 x3 what are they? inspired by the human...

Documents

08 neural networks - myreaders.info · rc chakraborty,...

artificial neural network lecture module 22. neural networks...

seasonality in human cognitive brain responses4.5 d, to...

loopy neural nets: imitating feedback loops in the...

modeling human brain function with artificial neural...

cps331 lecture: neural networks - cs.gordon.edu...

human brain development neural tube formation brain growth...

neural networks synapses and -...

mind-brain supervenience & neural recruitment

modulation of human neural stem cell differentiation in...

neural interfaces for the brain and spinal cord—restoring...

neural networks in data mining - 123seminarsonly.com ·...

expert system, fuzzy logic, and neural network ...€¦ ·...

whole-brain 3d mapping of human neural transplant...

hypoxia induces early neurogenesis in human fetal neural...

analysis of neural activity in human motor cortex –...

machine learning neural networks human brain neurons

behavioural brain research elsevier - … 1994 -...

neural networks synapses...

exploring commonalities across participants in the neural...