reservoir computing approaches - e-learning · recurrent neural networks (rnns) ... vanishing of...

82
Reservoir Computing Approaches Claudio Gallicchio 1

Upload: others

Post on 18-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Reservoir Computing Approaches

Claudio Gallicchio

1

Page 2: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

IntroductionA randomized approach employs a degree of randomness as part of its constructive strategy

Randomness can be viewed as an effective part of MLapproaches

Randomization can enhance several aspects of ML methodologies

data processing, learning, hyper -parameter selectionRandomization as a key factor for building efficient ML models

Focus : Randomized Recurrent Neural Network Approaches

2

Page 3: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Randomness in Machine LearningRandomness can be present at different levels:

To enhance predictive performanceTo alleviate difficulties of classical ML methodologies

Randomness in Specific Aspects of ML:Data Processing

Learning and hyper -parametrization selectionIn the core design of the model

3

Page 4: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Randomness in Machine Learning

Data splitting to avoid overfitting Hold-out, k-fold cross validation, leave-one-out

Data Generation Bootstrap, Balancing techniques

Learning: e.g. observation order Hyper-parameters selection

4

TR1

TR2

TR3 VL3

VL2

VL1 TS1

TS2

TS3

data set

Page 5: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Intrinsically Random ModelsRandomness as a key element of the ML model

Random neurons of Restricted Boltzmann Machines

Extremely Randomized Trees, Random Forests

5

Pool/ensemble of randomized decision trees

Each tree is built from samples randomly drawn from the training set

Randomized selection of the input variables to choose for splitting the nodes

Page 6: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Randomized Neural Network Approaches In the Neural Network (NN) area randomness gave rise to a

rich set of models Roots are in the pioneering work on the Perceptron

Other pioneering works pointed out the relevance of learning in the output layer w.r.t. the hidden layer

Recently randomized NNs delineate a research line of growing interest

Particularly suitable when efficiency is of primary relevance6

AI PROJECTIONAREA

AII ASSOCIATIONAREA

RESP O N SESRE T IN A

Page 7: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

General Architectural Structure

Non-linearly embed the input into a high-dimensional feature space where the problem is more likely to be solved linearly

Theoretically grounded basis (Cover’s Th.)

7

input output

feature representation

Untrained hidden layer

Trained readout layer

• Combine the features in the hidden space for outputcomputation

• Typically implemented by linear models

Page 8: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Dynamical Recurrent Models Add recurrent connections to a feed-forward architecture

in order to deal with temporal data in a natural fashion Computation is based on dynamical systems

8

dynamical component

feed-forward component

Page 9: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Recurrent Neural Networks (RNNs)Neural network architecture with

explicit recurrent connectionsFeedbacks allows the

representation of the temporalcontext in the state (neural memory)Discrete -time non-autonomous dynamical systemPotentially the input history can be

maintained for arbitrary periods of timeTheoretically very powerful

Universal approximation through

learning

9

Page 10: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Learning with RNNs (repetita) Universal approximation of RNNs (e.g. SRN, NARX) through

learning Training algorithms involve some downsides that you

already know Relatively high computational training costs and potentially

slow convergence Local minima of the error function (which is generally non-

convex) Vanishing of the gradients and problem of learning long-term

dependencies

10

Page 11: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Liquid State Machines W. Maas, T. Natschlaeger, H. Markram (2002)

Originated from the study of biologically inspired spiking neurons The liquid should satisfy a pointwise separation property Dynamics provided by a pool of spiking neurons with bio-inspired arch.

11

W. Maass, T. Natschlaeger, and H. Markram, Real-time computing withoutstable states: A new framework for neural computation based on perturbations,Neural Computation. 14(11), 2531–2560, (2002)

Integrate-and-fire

Izhikevich

Page 12: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Fractal Prediction MachinesP. Tino, G. Dorffner (2001)

Contractive Iterated Function Systems

Fractal Analysis

12

Tino, P., Dor®ner, G.: Predicting the future of discrete sequences from fractal representations of the past. Machine Learning 45 (2001) 187-218

Page 13: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Echo State NetworksH. Jaeger ( 2001)

Control the spectral properties of the recurrence matrix

Echo State Property

13

Jaeger, H.: The "echo state" approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology (2001)

Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304 (2004) 78-80

Page 14: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Reservoir Computing

14

readout weight matrix

input weight matrix

reservoir weight matrix

non-linearity of the expansion

The untrained reservoir hidden layer is made up recurrent units The reservoir implements a pool of random filters that provides memory

Page 15: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

15

Echo State Networks

Page 16: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Echo state Network: Architecture

16

Input Space: Reservoir State Space: Output Space:

Reservoir: untrained, large, sparsely connected, non-linear layer

Readout: trained, linear layer

Page 17: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Reservoir: State Computation

17

It is also useful to consider the iterated version of the state transition function the reservoir state after the presentation of an entire input sequence

The reservoir layer implements the state transition function of the dynamical system

Page 18: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Echo State Property (ESP)A valid ESN should satisfy the “Echo State Property” (ESP) Def. An ESN satisfies the ESP whenever:

The state of the network asymptotically depends only on the driving input signal

Dependencies on the initial conditions are progressively lost In literature it is characterized by equivalence among state

contractivity, state forgetting and input forgetting

18

Page 19: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Conditions for the ESPThe ESP can be guaranteed by controlling the spectral properties of the recurrent weight matrix

Theorem. If the maximum singular value of is less than 1 then the ESN satisfies the ESP.

Sufficient condition for the ESP (contractive dynamics)

Theorem. If the spectral radius of is greater than 1 than the ESN does not satisfy the ESP.

Necessary condition for the ESP (stable dynamics)

recall: the spectral radius is the maximum among the absolute values

of the eigenvalues

19

Page 20: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESN Initialization: How to setup the Reservoir

20

Elements in 𝐖𝐖𝑖𝑖𝑖𝑖 are selected randomly in [−𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑖𝑖𝑖𝑖, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑖𝑖𝑖𝑖] �𝐖𝐖 initialization procedure:

Start with a randomly generated matrix �𝐖𝐖𝑟𝑟𝑟𝑟𝑖𝑖𝑟𝑟

Scale �𝐖𝐖𝑟𝑟𝑟𝑟𝑖𝑖𝑟𝑟 to meet the condition for the ESP (usually: the necessary one)

Page 21: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESN Training

21

Discard an initial transient (washout) Run the network on the whole input sequence and collect the

reservoir states

Solve the least squares problem defined by

s

Page 22: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Training the Readout

22

On -line training is not the standard choice for ESNsLeast Mean Squares is typically not suitable

High eigenvalue spread (i.e. large condition number) of X

Recursive Least Squares is more suitable

Off -line training is standard in most applicationsClosed form solution of the least squares problem by direct

methodsMoore -Penrose pseudo-inversion

Possible regularization using random noise

Ridge -regression

𝝀𝝀𝒓𝒓 is a regularization coefficient

Page 23: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Training the Readout/2

23

Multiple readouts for the same reservoir Solving more than 1 task with the same reservoir dynamics

Other choices for the readout: Multi-layer Perceptron Support Vector Machine K-Nearest Neighbor …

Page 24: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESN – Algorithmic DescriptionInitialization

Win = 2*rand(Nr,Nu) - 1; Win = scale_in * Win;

Wh = 2*rand(Nr,Nu) - 1; Wh = rho * (Wh / max(abs(eig(Wh)));

state = zeros(Nr,1);

Run the reservoir on the input streamfor t = 1:Ntstate = tanh(Win * u(t) + Wh * state);X(:,end+1) = state;

end

Discard the washout

X = X(:,Nwashout+1:end);

Train the readoutWout = Ytarget(:,Nwashout+1:end)*X’*inv(X*X’+lambda_r*eye(Nr));

The ESN is now ready for operation (estimations/predictions)

24

Page 25: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESN Hyper-parameterization (Model Selection)ESNs are easy to implement, efficient to train, but…

Require a careful selection of several hyper-parametersreservoir dimension

spectral radius input scaling

readout regularization

Reservoir sparsity•

Non• -linearity of reservoir activation functionInput bias•

Length of the • transient/washout

architectural design•

25

ESN hyper-parametrization should be chosen carefully through an appropriate model selection procedure

major hyper-parameters

other hyper-parameters

Page 26: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESN Major Architectural Variants

26

direct connections from the input to the readout

feedback connections from the output to the reservoir

might affect the stability of the network’s dynamics small values are typically used

Page 27: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESN for sequence-to-element tasksThe learning problem requires one single output for each input sequence

Granularity of the task is on entire sequences (not on time-steps)example: sequence classification

27

input sequence

output element

time →

𝒙𝒙(𝟏𝟏)

𝒙𝒙(𝟑𝟑)

𝒙𝒙(𝟒𝟒)𝒙𝒙(𝟓𝟓)

𝒙𝒙(𝟐𝟐)

𝒙𝒙(𝒔𝒔) 𝒚𝒚(𝒔𝒔)

Last state

𝒙𝒙 𝑠𝑠 = 𝒙𝒙 𝑁𝑁𝑠𝑠

Mean state

𝒙𝒙 𝑠𝑠 = �1 𝑁𝑁𝑠𝑠 ∑𝑡𝑡=1𝑁𝑁𝑠𝑠 𝒙𝒙(𝑡𝑡)

Sum state

𝒙𝒙 𝑠𝑠 = ∑𝑡𝑡=1𝑁𝑁𝑠𝑠 𝒙𝒙(𝑡𝑡)

𝒔𝒔 = [𝒖𝒖 𝟏𝟏 , … ,𝒖𝒖 𝑵𝑵𝒔𝒔 ]

Page 28: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Leaky Integrator ESN (LI-ESN)

28

Use leaky integrator reservoir units

Apply an exponential moving average to reservoir states

low -pass filter to better handle input signals that change slowly with respect to the sampling frequency

a is the leaking rate parameter 𝑠𝑠 ∈ 0,1controls the speed of reservoir dynamics in reaction to the input

smaller values imply reservoir that react more slowly to the input changesif a = 1 then standard ESN dynamics are obtained

Page 29: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

29

Examples of Applications

Page 30: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /1Mackey -Glass time series

30

contraction coefficient reservoir dimension

for α > 16.8 the system has a chaotic attractor

most used values are 17 and 30

ESN performance on the MG17 task

Page 31: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /2 Forecasting of indoor user movements

31

Generalization of predictive performance to unseen environments

https://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+dataDataset is available online on the UCI repository

Deployed WSN: 4 fixed sensors (anchors) & 1 sensor worn by the user (mobile)Predict if the user will change room when she is in position M

Input : received signal strength (RSS) data from the 4 anchors (10 dimensional vector for each time step, noisy data)Target : binary classification (change environmental context or not)

http://fp7rubicon.eu/

Page 32: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /2 Forecasting of indoor user movements – Input data

32

example of the RSS traces gathered from all the 4 anchors in the WSN, for different possible movement paths

Page 33: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /3 (repetita)Human Activity Recognition (HAR) and Localization

33

Input from heterogeneous sensor sources (data fusion) Predicting event occurrence and confidence High accuracy of event recognition/indoor localization

> 90 % on test data Effectiveness in learning a variety of HAR tasks Effectiveness in training on new events

Page 34: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /4 Robotics

34

Indoor localization estimation in critical environment (Stella Maris Hospital)

Precise robot localization estimation using noisy RSSI data (35 cm)

Recalibration in case of environmental alterations or sensor malfunctions

Input: temporal sequences of RSSI values (10 dimensional vector for each time step, noisy data)

Target: temporal sequences of laser-based localization (x,y)

Page 35: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /5Prediction of the Electricity price on the Italian Market

35

Accurate prediction of hourly electricity price (less than 10% MAPE error)

Page 36: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /6 Speech and Text Processing EVALITA 2014 – Emotion recognition track (Sentiment Analysis)

36

Waveform of speech signal Average reservoir state Emotion Class

Challenge : the reservoir encodes the temporal input signals avoiding the need of explicitly resorting to fixed-size feature extraction

Promising performances already in line with the state of the art

7 classes

Page 37: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /7 Human Activity Recognition

37

Classification of human daily activities from RSS data generated by sensors worn by the user

Input: temporal sequences of RSS values (6 dimensional vector for each time step, noisy data)

Target: classification of human activity (bending, cycling , lying, sitting, standing, walking)

Extremely good accuracy ( ≈ 0,99) and F1 score ( ≈ 0,96)

2nd Prize at 2013 EvAAL International Competition

http://archive.ics.uci.edu/ml/datasets/Activity+Recognition+system+based+on+Multisensor+data+fusion+%28AReM%29Dataset is available online on the UCI repository

Page 38: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /8 Health-care monitoring

38

The DOREMI Project Decrease of cognitive decline, malnutrition and sedentariness in aging population Development of a lifestyle management systems exploiting intelligent sensor

networks to realize smart environment for context awareness Activity recognition tasks in the context of health monitoring (e.g. balance quality,

physical activity, socialization, … ) RC networks used for assessing balance abilities and other relevant health/social

parameters/indicators in elderly people

http://www.doremi-fp7.eu/oremi

Project Founded by the European UnionGrant agreement n: 611650Coordination by IFC-CNR

Page 39: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /8

39

Wii Balance Board

BBS

Autonomous Balance Assessment

An unobtrusive automatic system for balance assessment in elderlyBerg Balance Scale (BBS) test : 14 exercises/items (̴30 min.)

Input : stream of pressure data gathered from the 4 corners Nintendo Wii board during the execution of just 1 (over the 14) BBS exercisesTarget : global BBS score of the user (0-56)The use of RNNs allow to automatically exploit the richness of the signal

dynamics oremi

Page 40: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Applications of ESNs: Examples /8

40

Autonomous Balance Assessment Excellent prediction performance using LI-ESNs

Practical example of how performance can be improved in a real-world case By an appropriate design of the task

e.g. inclusion of clinical parameters in input By an appropriate choices for the network design

e.g. by using a weight sharing approach on the input-to-reservoir connections

LI-ESN model Test MAE (BBS points)

Test R

standard 4,80 ± 0,40 0,68

+ weight 4,62 ± 0,30 0,69

LR weight sharing (ws) 4,03 ± 0,13 0,71

ws + weight 3,80 ± 0,17 0,76

Very good comparison

with related models (MLPs, TDNN, RNNs, NARX, …)

with literature approaches

oremi

Page 41: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

41

Echo State Property

Page 42: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Echo State PropertyAssumption: Input and state spaces are compact setsA reservoir network whose state update is ruled by

satisfies the ESP if initial conditions are asymptoticallyforgotten

The state dynamics provide “echoes” of the driving inputEssentially, this is a stability condition

42

Page 43: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Echo State Property Why a stable regime is so important? An unstable network exhibits sensitivity to input

perturbations Two slightly different input sequences drive the network into

different states

Good for training The state vectors tend to be more and more linearly separable

(for any given task)

Bad for generalization: overfitting! No generalization ability if a temporal sequence similar to one

in the training set drives the network into completely different states

43

Page 44: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESP: Sufficient ConditionThe sufficient condition for the ESP analyzes the case of

contractive dynamics of the state transition functionWhatever is the driving input signal, if the system is

contractive then it will exhibit stability

44

Page 45: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Contractivity

45

The reservoir state transition function rules the evolution of the corresponding dynamical system

Def. The reservoir has contractive dynamics whenever its state transition function F is Lipschitz continuous with constant C < 1

Page 46: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Contractivity and the ESP Theorem. If an ESN has a contractive state transition function F, then it satisfies

the Echo State Property Assumption: F is contractive with parameter C < 1

46

Contractivity ESP

goes to 0 as n goes to infinity the ESP holds

Page 47: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Contractivity and Reservoir Initialization If the reservoir is initialized to implement a contractive mapping

than the ESP is guaranteed (in any norm, for any input) Formulation of the sufficient condition for the ESP Assumptions:

Euclidean distance as metric in the state space Reservoir units with tanh activation function (note: squashing nonlinearities

bound the state space)

47

Page 48: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Contractivity, Markovianity and ESNsMarkovianity and contractivity:

Iterated Function Systems, fractal theory, architectural bias of RNNs RNNs initialized with small weights (with contractive state transition function) and bounded state space implement (approximate arbitrarily well) definite memory machines

RNN dynamics (before training) constrained in region of state space with Markovian characterizationContractivity (in any norm) of state transition function implies Echo States ESNs featured by fixed contractive dynamics

Relations with the universality of RC for bounded memory computation (LSMs theory)

Markovian nature: states assumed in correspondence of different input sequences sharing a common suffix are close to each other proportionally to the length of the common suffix

48

Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929

Maass, W., Natschlager, T., Markram, H.: Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation 14 (2002) 2531-2560

Page 49: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Markovianity Input sequences sharing a common suffix drive the system into close states

The states are close to each other proportionally to the length of the common suffix

Inherent ability to discriminate among different input histories in a suffix-based fashion without training of reservoir weights

Why/When do ESNs work? the target should match the Markovian property of the reservoir state space

organization

49

Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.

Page 50: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Markovianity: A practical case

Representation of the ESN state space (by PCA) The network is driven by a symbolic sequence The reservoir state spaces naturally organizes in a fractal way

50

σ = 0.3

1st PC: last input symbol

2nd PC: next-to-last input symbol

eb hj a a c

ea hi e c c

jg je e g f

jg he e g h

Gallicchio, C., Micheli, A. (2010). A Markovian characterization of redundancy in echo state networks by PCA. In Proceedings of ESANN 2010.

Page 51: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESP: Necessary Condition Investigating the stability of reservoir dynamics from a

dynamical system perspective Theorem. If an ESN has unstable dynamics around the zero

state and the zero sequence is an admissible input, then the ESP is not satisfied.

Study the stability of the linearized dynamical system

around the 0 state and for 0 input:

51

Page 52: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

ESP: Necessary Condition The linearized system now reads:

0 is a fixed point. Is it stable?

If 𝜌𝜌 �𝐖𝐖 < 1 the fixed point is stable Otherwise: if we start from a state near 0 and we drive the

network with the null sequence we do not end un in 0 The null sequence is a counter-example: the ESP does not hold!

52

Page 53: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

60

Reservoir Computing for Structured Domains

Page 54: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Learning in Structured Domains

61

In many real• -world application domains the information of interest can be naturally represented by the means of structured data representations.The problems of interest can be modeled as regression or classification •tasks on structured domains.

Sequences Trees Graphs

QSPR analysis of Alkanes

Boiling Point

Predictive Toxicology ChallengeMG -Chaotic Time Series Prediction

{−1, +1}

Page 55: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Learning in Structured Domains

62

Learning in domains of trees and graphs opens up a wide range of research directions:• Theoretical• Experimental Analysis in interdisciplinary areas

• QSAR/QSPR• Computational Toxicology, Cheminformatics• Social and Web information Processing• Document processing• Parallel Computation• …

ProblemsLearning in Structured Domains entails a number of open research problems, mainly related to the increasing complexity of the data domains to treat

• Efficiency• Generalization of class of data structures supported• Adaptivity• Generalization ability

Page 56: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Tree Echo State Networks

63

Extend the applicability of Reservoir Computing approaches to tree structured data

Extremely efficient approach for modeling Recursive Neural Networks (RecNN)

Architectural and experimental baseline for RecNNs with fully trained recurrent weights

Gallicchio, C., Micheli, A. (2013). Tree echo state networks. Neurocomputing, 101, 319-337.

Page 57: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

TreeESN: ReservoirLarge, sparsely connected, untrained layer of non -linear recursive unitsInput driven dynamical system on discrete tree structures

64

Page 58: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

TreeESN: State mapping and Readout State Mapping function for tree-to-element tasks one single output vector (unstructured) is required for each

input tree example: document classification

Readout computation and training is as in the case of standard Reservoir Computing approaches tree-to-tree (isomorphic transductions) tree-to-element

65

( )x t

Root State Mapping

Mean State Mapping

Page 59: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

TreeESN: Echo State PropertyThe recursive reservoir dynamics can be left untrained

provided that a contractive property is satisfiedEcho State Property for trees

for two any initial states, the state computed for the root of the input tree should converge for increasing height of the tree

Contractive state transition function

Sufficient condition for the ESP for trees

being contractiv e

Necessary condition

being stable66

sharing a common suffix of height h

Page 60: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

TreeESN: MarkovianityContractivity of state dynamics entails MarkovianityMarkovian characterization extended to tree processing

67

• The reservoir of TreeESN is able to discriminate among input trees in a Markovian tree suffix –based way without any training

• Suitable for tasks with target functions compatible with Markovianity

Page 61: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

TreeESN: Efficiency

68

Computational ComplexityExtremely efficient RC approach: only the linear readout parameters are trained

Encoding Process

For each tree t

• Scales linearly with the number of nodes and the reservoir dimension• The same cost for training and test• Compares well with state of art methods for trees:

• RecNNs: extra cost (time + memory) for gradient computations• Kernel methods: higher cost of encoding (e.g. Quadratic in PT kernels)

number of nodes max degree number of reservoir unitsdegree of connectivity

Output Computation• Depends on the method used (e.g. Direct using SVD or iterative)• The cost of training the linear TreeESN readout is generally inferior to the cost

of training MLPs or SVMs (used in RecNNs and Kernels)

Page 62: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

TreeESN: State Space and Applications

69

Root state mapping

Mean state mapping

QSPR Analysis of Alkanes

Predict the boiling point of alkanes

Target is related to global properties of the molecules (num of carbons + branching pattern): non

Markovian

Page 63: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Graph Echo State NetworksExtend the applicability of Reservoir Computing to general

graphsTwo major advantages:

efficiency – training on graph data is computationally intensive!large class of data – naturally enable processing of cyclic and undirected structures

70

Page 64: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

GraphESN: Reservoir Input-driven dynamical system on discrete graphs The same reservoir architecture is applied to all the

vertices in the structure

Stability of the state update ensures a solution even in case of cyclic dependencies among state variables

71

τ

τττ

τ

τ

ττττ

v

τ

τ

τ ττ̂Standard ESN GraphESN

Page 65: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

GraphESN: Echo State Property A stability constraint is required to achieve usable

dynamics Foundational idea: resort to contractive dynamics Contractive dynamics ensure the convergence of the encoding

process (Banach Th.) Enables an iterative computation of the encoding process

Sufficient condition Necessary condition

72

Page 66: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

GraphESN: Markovianity

73

e.g. d = 2

Contractivity of the state transition function implies reservoir •dynamics with Markovian flavourSuffix: the concept is extended to the set of d• -neighbors of a vertex v, i.e. 𝑁𝑁 𝑟𝑟 (𝑣𝑣)

such that

Markovian Characterization of GraphESN Dynamics

Page 67: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

74

Deep Learning and Reservoir Computing

Page 68: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Deep Neural Networks

75

Deep Learning is an attractive research area Hierarchy of many non-linear models Ability to learn data representations at different (higher) levels

of abstraction

Deep Neural Networks (deepNNs) Feed-forward hierarchy of multiple hidden layers of non-linear

units

Impressive performance in real-world problems

Page 69: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Deep Recurrent Neural NetworksExtension of the deep learning methodology to temporal processing.Aim at naturally capture temporal feature representations at different time-scales Text, speech, language processing Multi-layered processing of temporal information with feedbacks

has a strong biological plausibility

The analysis of deep RNNs is still young Stability: characterize the dynamics of hierarchically organized

recurrent models Deep Reservoir Computing: Investigate the actual role of layering in

deep recurrent architectures

76

Page 70: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Deep RNN Architecture: The role of Layering

77

Constraints to the architecture of a fully connected RNN, by: Removing connection from

input to higher layers Removing connections from

higher layers to lower ones Removing connections

to layers at levels higher than +1

Less weights to store than a fully connected RNN

Page 71: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Deep Echo State Network

What is the intrinsic role of layering in recurrent architectures?

Develop novel efficient approaches to exploit Multiple time-scales

representations Extreme efficiency of training

78

C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017

untrained

Page 72: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

DeepESN: Architecture and Dynamics

79

first layer

l-th layer (l>1)

Page 73: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

DeepESN: Hierarchical Temporal Features

80

Structured representation of temporal data through the deep architecture

Empirical Investigations Effects of input perturbations

lasts longer in higher layers Multiple time-scales

representation Ordered along the network’s

hierarchy

C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017

Page 74: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

DeepESN: Hierarchical Temporal Features

81

Structured representation of temporal data through the deep architecture

Frequency Analysis Diversified magnitudes of FFT components

Multiple frequency representation

Ordered along the network’s hierarchy

Higher layers focus on lower frequenciesC. Gallicchio, A. Micheli, L. Pedrelli, WIRN 2017

layer 1 layer 4

layer 7 layer 10

Page 75: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

DeepESN: Hierarchical Temporal Features

82

Structured representation of temporal data through the deep architecture

Theoretical Analysis Higher layers intrinsically implement less contractive dynamics

Deeper networks naturally develop richer dynamics, closer to the edge of chaos

Convenient way of architectural setup

Gallicchio, Micheli. Cognitive Computation (2017).

Gallicchio, Micheli, Silvestri. Neurocomputing 2018.

Page 76: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

DeepESN: Output Computation

83

The readout can modulate the temporal features at different layers

Page 77: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Cloosing the loops: Deep Tree Echo State Networks

84

Deep Tree Echo State Networks Untrained multi-layered

recursive neural networkGallicchio, Micheli, IJCNN, 2018.

Page 78: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Deep Tree Echo State Networks: Unfolding

85

Page 79: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Deep Tree Echo State Networks: Advantages

86

Effective in applications

Extremely efficient Layered recursive architecture Shallow case

Page 80: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Conclusions Reservoir Computing: paradigm for efficient modeling of

RNNs Reservoir: non-linear dynamic component, untrained after

contractive initialization Readout: linear feed-forward component, trained Easy to implement, fast to train Markovian flavour of reservoir state dynamics Successful applications Recent extensions toward: Deep Learning architecture Structured Domains

87

Page 81: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

Research Issues Optimization of reservoirs: supervised or unsupervised reservoir

adaptation (e.g. Intrinsic Plasticity) Architectural Studies: e.g. Minimum complexity ESNs, 𝜑𝜑-ESNs,

orthogonal ESNs, … Physical realizations: Photonic Reservoir Computing Reservoir Computing for learning in Structured Domains Tree Echo State Networks Graph Echo State Networks

Deep Reservoir Computing Deep Echo State Networks

Applications, applications, applications…

88

Page 82: Reservoir Computing Approaches - e-learning · Recurrent Neural Networks (RNNs) ... Vanishing of the gradients and problem of learning long-term dependencies. 10. Liquid State Machines

References Jaeger H. The “echo state” approach to analysing and training recurrent

neural networks - with an erratum note. Tech. rep. GMD - German National Research Institute for Computer Science, Tech. Rep. 2001.

Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science. 2004;304(5667):78–80.

Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.

Lukoševičius, Mantas, and Herbert Jaeger. "Reservoir computing approaches to recurrent neural network training." Computer Science Review 3.3 (2009): 127-149.

Lukoševičius, Mantas. "A practical guide to applying echo state networks." Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012. 659-686.

89