neural networks - zagazig university · through the network, cyclic paths exist that connect one or...
Post on 19-Oct-2020
0 Views
Preview:
TRANSCRIPT
Neural networks
1
Dynamic networks: Recurrent neural networks
They learn a nonstationary I/O mapping,Y(t)f(X(t)), X(t) and Y(t) are timevarying patterns
They model dynamic systems: control systems,optimization problems, artificial vision and speechrecognition tasks, time series predictions
2
Introduction
Dynamic networks
Equipped with a temporal dynamics, thesenetworks are able to capture the temporalstructure of the input and to “produce” a timelineoutput
Temporal dynamics: unit activations can changeeven in presence of the same input pattern
Architectures composed by units having feedbackconnections, both between neurons belonging tothe same layer or to different layers
Partially recurrent networks
Recurrent networks
3
Partially recurrent networks
Feedforward networks equipped with a set of inputunits, called state or context unitsThe context layer output correspond to the output,at the previous time step, of the units that emitfeedback signals, and it is sent to the unitsreceiving feedback signals
Elman network (1990)
Jordan network (1986)
4
Elman networks 1
The output of each context unit is equal to that of thecorresponding hidden unit at the previous (discrete)instant:
xc,i(t) xh,i(t 1)
To train the network, the Backpropagation algorithm isused, in order to learn the hiddenoutput, theinputhidden and the contexthidden weights 5
Feedback connections on thehidden layer, with fixedweights all equal to oneContext units equal, innumber, to the hidden units,and considered just as inputunits
All the output functions operate on the weighed sum ofthe inputs, except for the input and the context layers(that act just as “buffers”)
Actually, sigmoidal functions are used in both the hiddenand the output layer
The context layer inserts a singlestep delay in thefeedback loop: the output of the context layer ispresented to the hidden layer, in addition to thecurrent pattern
The context layer adds, to the current input, a value thatreproduces the output achieved at the hidden layerbased on all the patterns presented up to the previousstep
6
Elman networks 2
Learning all the trainable weights are attached toforward connections
1) The activation of the context units is initially set to zero,
i.e. xc,i(1)0, i at t1
2) Input pattern xt: evaluation of the activations/outputs ofall the neurons, based on the feedforward transmissionof the signal along the network
3) Weight updating using Backpropagation
4) Let t t1 and go to 2)
The Elman network produces a finite sequence ofoutputs, one for each input
The Elman network is normally used for object trajectoryprediction, and for the generation/recognition oflinguistic patterns
7
Elman networks 3
Elman matlab code
• Elmannet (layer delays ,hidden Sizes ,train Fcn)
• Ex: Here an Elman neural network is used to
solve a simple time series problem.
• [X,T] = simpleseries_dataset; net =
elmannet(1:2,10);
• [Xs,Xi,Ai,Ts] = preparets(net,X,T);
• net = train(net,Xs,Ts,Xi,Ai);
• view(net) Y = net(Xs,Xi,Ai);
• perf = perform(net,Ts,Y)
8
Jordan networks 1
Feedback connections on the output layer, with fixedweights all equal to oneSelffeedback connections for the state neurons, with
constant weights equal to a; a 1 is the recency
constant
9
The network output is sent to the hidden layer byusing a context layerThe activation, for the context units, is determinedbased on the activation of the same neurons and ofthe output neurons, both calculated at the previoustime step
xc,i(t) xo,i(t 1) axc,i(t 1) Selfconnections allow the context units to develop alocal or “individual” memory
To train the network, the Backpropagation algorithm isused, in order to learn the hiddenoutput, theinputhidden and the contexthidden weights
10
Jordan networks 2
The context layer inserts a delay step in the feedbackloop: the context layer output is presented to thehidden layer, in addition to the current pattern
The context layer adds, to the input, a value thatreproduces the output achieved by the network based onall the patterns presented up to the previous step,coupled with a fraction of the value calculated, also atthe previous step, by the context layer itself (viaselfconnections)
11
Jordan networks 3
Recurrent networks 1
A neural network is said to be recurrent if it containssome neurons whose activations depend directly orindirectly from their outputsIn other words, following the signal transmissionthrough the network, cyclic paths exist that connectone or more neurons with itself/themselves:
without crossing other neurons direct feedback (xi(t)explicitly appears in the evaluation of ai(t1) where ai()and xi() respectively represent the activation and theoutput of neuron i)
and/or crossing other neurons undirect feedback
A fully connected neural network is always a recurrentnetwork
12
Recurrent networks 2
13
RNN with lateral feedbacks
Fully connected RNN
RNN with self feedbacks
A recurrent network processes a temporal sequence byusing an internal state representation, thatappropriately encodes all the past information injectedinto its inputs
memory arises from the presence of feedback loopsbetween the output of some neurons and the input ofother neurons belonging to the same/previous layersassuming a synchronous update mechanism, thefeedback connections have a memory element (aonestep delay)
The inputs are sequences of arrays:
where Tp represents the length of the pth sequence
(in general, sequences of finite length are considered,even if this is not a necessary requirement)
14
Recurrent networks 3
Using an MLP as the basic block, multiple types ofrecurrent networks may be defined, depending onwhich neurons are involved in the feedback
The feedback may be established from the output to theinput neuronsThe feedback may involve the output of the hidden layerneuronsIn the case of multiple hidden layers, feedbacks can alsobe present on several layers
Therefore, many different configurations are possiblefor a recurrent networkMost common architectures exploit the ability of MLPsto implement non linear functions, in order to realizenetworks with a non linear dynamics
15
Recurrent networks 4
The behaviour of a recurrent network (during a timesequence) can be reproduced by unfolding it in time,and obtaining the corresponding feedforward network
16
x(t) = f(x(t1),u(t))
y(t) = g(x(t),u(t))
y
y1
y2
u1
u
u2
Recurrent networks 5
Recurrent processing
Before starting to process the pth sequence, the state
of the network must be initialized to an assigned value
(initial state) xp(0)Every time the network begins to process a newsequence, there occurs a preliminary “reset” to the initialstate, losing the memory of the past processing phases,that is, we assume to process each sequenceindependently from the others
At each time step, the network calculates the current
output of all the neurons, starting from the input up(t)
and from the state xp(t1)
17
Processing modesLet us suppose that the Lth layer represents the
output layerThe neural network can be trained to transform the inputsequence into an output sequence of the same length(realizing an Input/Output transduction)
A different case is when we are interested only in thenetwork response at the end of the sequence, so as totransform the sequence into a vector
This approach can be used to associate each sequence to aclass in a set of predefined classes
18
Learning in recurrent networks
Backpropagation Through Time (BPTT, Rumelhart,Hinton, Williams, 1986)
The temporal dynamics of the recurrent network is“converted” into that of the corresponding unfoldedfeedforward network
Advantage: very simple to calculate
Disadvantage: heavy memory requirements
RealTime Recurrent Learning (Williams, Zipser, 1989)
Recursive calculation of the gradient of the cost functionassociated with the network
Disadvantage: computationally expensive
19
Learning Set
Let us consider a supervised learning scheme inwhich:
input patterns are represented by sequences
target values are represented by subsequences
Therefore, the supervised framework is supposed toprovide a desired output only with respect to a subset ofthe processing time steps
In the case of sequence classification (or sequence coding
into vectors) there will be a single target value, at time Tp
20
Cost function
The learning set is composed by sequences, eachassociated with a target subsequence
where ϵ stands for empty positions, possibly contained
in the target sequenceThe cost function, measuring the difference between thenetwork output and the target sequence, for all theexamples belonging to the learning set, is defined by
where the instantaneous error epW(ti) is expressed as the
Euclidean distance between the output vector and thetarget vector (but, other distances may also be used)
21
BackPropagation Through Time 1
Given the targets to be produced, the network can betrained using BPTTUsing BPTT means…
…considering the corresponding feedforward networkunfolded in time the length Tp of the sequence to belearnt must be known
…updating all the weights wi(t), t1,…,Tp, in thefeedforward network, which are copies of the same wi inthe recurrent network, by the same amount,corresponding to the sum of the various updates
reported in different layers all the copies of wi(t)should be maintained equal
22
Let N be a recurrent network that must be trained,
starting from 0, on a sequence of length Tp
On the other hand, let N* be the feedforwatd networkobtained by unfolding N in timeWith respect to N* and N, the following statements
hold:N* has a “layer” that contains a copy of N, correspondingto each time step
Each layer in N* collects a copy of all the neuronscontained in N
For each time step t[0, Tp], the synapse from neuron i inlayer l to neuron j in layer l1 in N* is just a copy of thesame synapse in N
23
BackPropagation Through Time 2
24
BackPropagation Through Time 3
Feedforward network correspondingto a sequence of length T4
Recurrent network
The gradient calculation may be carried out in afeedforward networklike style
The algorithm can be derived from the observation thatrecurrent processing in time is equivalent to constructingthe corresponding unfolded feedforward networkThe unfolded network is a multilayer network, on whichthe gradient calculation can be realized via standardBackPropagationThe constraint that each replica of the recurrent networkwithin the unfolding network must share the same set ofweights has to be taken into account (this constraintsimply imposes to accumulate the gradient related toeach weight with respect to each replica during thenetwork unfolding process)
25
BackPropagation Through Time 4
The meaning of backpropagation through time ishighlighted by the idea of network unfoldingThe algorithm is nonlocal in time the wholesequence must be processed, by storing all theneuron outputs at each time step but it is localin space, since it uses only local variables to eachneuronIt can be implemented in a modular fashion,based on simple modifications to the Back-Propagation procedure, normally applied to staticMLP networks
26
BackPropagation Through Time 5
The simplest dynamic data type is the sequence,which is a natural way to model temporal domains
In speech recognition, the words, which are the object ofthe recognition problem, naturally flow to constitute atemporal sequence of acoustic features
In molecular biology, proteins are organized in aminoacid strings
The simplest dynamic architectures are recurrentnetworks, able to model temporal/sequentialphenomena
recurrent networks
27
Structured domains
In many realworld problems, the information isnaturally collected in structured data, that have ahybrid nature, both symbolic and subsymbolic,and cannot be represented regardless of the linksbetween some basic entities:
Classification of chemical compoundsAnalysis of DNA regulatory networksTheorem provingPattern recognitionWorld Wide Web
28
Example 1: Inference of chemical properties
Chemical compounds are naturally represented asgraphs (undirected and cyclic)
29
Example 2: Analysis of DNA regulatory networks
A gene regulatory network is a collection of regulatorsthat interact with each other to govern the geneexpression levels of mRNA and proteins
30
Example 4: Pattern recognition
Each node of the tree contains local features, suchas area, perimeter, shape, color, etc., of the relatedobject, while branches denote inclusion relations
31
Feedforward- vs. recurrent NN
...
...
...
...
...
...
Input InputOutput Output
• connections only "from left
to right", no connection
cycle
• activation is fed forward
from input to output through
"hidden layers"
• no memory
• at least one connection
cycle
• activation can
"reverberate", persist even
with no input
• system with memory
top related