reservoir computing approaches - e-learning · recurrent neural networks (rnns) ... vanishing of...
TRANSCRIPT
Reservoir Computing Approaches
Claudio Gallicchio
1
IntroductionA randomized approach employs a degree of randomness as part of its constructive strategy
Randomness can be viewed as an effective part of MLapproaches
Randomization can enhance several aspects of ML methodologies
data processing, learning, hyper -parameter selectionRandomization as a key factor for building efficient ML models
Focus : Randomized Recurrent Neural Network Approaches
2
Randomness in Machine LearningRandomness can be present at different levels:
To enhance predictive performanceTo alleviate difficulties of classical ML methodologies
Randomness in Specific Aspects of ML:Data Processing
Learning and hyper -parametrization selectionIn the core design of the model
3
Randomness in Machine Learning
Data splitting to avoid overfitting Hold-out, k-fold cross validation, leave-one-out
Data Generation Bootstrap, Balancing techniques
Learning: e.g. observation order Hyper-parameters selection
4
TR1
TR2
TR3 VL3
VL2
VL1 TS1
TS2
TS3
data set
Intrinsically Random ModelsRandomness as a key element of the ML model
Random neurons of Restricted Boltzmann Machines
Extremely Randomized Trees, Random Forests
5
Pool/ensemble of randomized decision trees
Each tree is built from samples randomly drawn from the training set
Randomized selection of the input variables to choose for splitting the nodes
Randomized Neural Network Approaches In the Neural Network (NN) area randomness gave rise to a
rich set of models Roots are in the pioneering work on the Perceptron
Other pioneering works pointed out the relevance of learning in the output layer w.r.t. the hidden layer
Recently randomized NNs delineate a research line of growing interest
Particularly suitable when efficiency is of primary relevance6
AI PROJECTIONAREA
AII ASSOCIATIONAREA
RESP O N SESRE T IN A
General Architectural Structure
Non-linearly embed the input into a high-dimensional feature space where the problem is more likely to be solved linearly
Theoretically grounded basis (Cover’s Th.)
7
input output
feature representation
Untrained hidden layer
Trained readout layer
• Combine the features in the hidden space for outputcomputation
• Typically implemented by linear models
Dynamical Recurrent Models Add recurrent connections to a feed-forward architecture
in order to deal with temporal data in a natural fashion Computation is based on dynamical systems
8
dynamical component
feed-forward component
Recurrent Neural Networks (RNNs)Neural network architecture with
explicit recurrent connectionsFeedbacks allows the
representation of the temporalcontext in the state (neural memory)Discrete -time non-autonomous dynamical systemPotentially the input history can be
maintained for arbitrary periods of timeTheoretically very powerful
Universal approximation through
learning
9
Learning with RNNs (repetita) Universal approximation of RNNs (e.g. SRN, NARX) through
learning Training algorithms involve some downsides that you
already know Relatively high computational training costs and potentially
slow convergence Local minima of the error function (which is generally non-
convex) Vanishing of the gradients and problem of learning long-term
dependencies
10
Liquid State Machines W. Maas, T. Natschlaeger, H. Markram (2002)
Originated from the study of biologically inspired spiking neurons The liquid should satisfy a pointwise separation property Dynamics provided by a pool of spiking neurons with bio-inspired arch.
11
W. Maass, T. Natschlaeger, and H. Markram, Real-time computing withoutstable states: A new framework for neural computation based on perturbations,Neural Computation. 14(11), 2531–2560, (2002)
Integrate-and-fire
Izhikevich
Fractal Prediction MachinesP. Tino, G. Dorffner (2001)
Contractive Iterated Function Systems
Fractal Analysis
12
Tino, P., Dor®ner, G.: Predicting the future of discrete sequences from fractal representations of the past. Machine Learning 45 (2001) 187-218
Echo State NetworksH. Jaeger ( 2001)
Control the spectral properties of the recurrence matrix
Echo State Property
13
Jaeger, H.: The "echo state" approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology (2001)
Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304 (2004) 78-80
Reservoir Computing
14
readout weight matrix
input weight matrix
reservoir weight matrix
non-linearity of the expansion
The untrained reservoir hidden layer is made up recurrent units The reservoir implements a pool of random filters that provides memory
15
Echo State Networks
Echo state Network: Architecture
16
Input Space: Reservoir State Space: Output Space:
Reservoir: untrained, large, sparsely connected, non-linear layer
Readout: trained, linear layer
Reservoir: State Computation
17
It is also useful to consider the iterated version of the state transition function the reservoir state after the presentation of an entire input sequence
The reservoir layer implements the state transition function of the dynamical system
Echo State Property (ESP)A valid ESN should satisfy the “Echo State Property” (ESP) Def. An ESN satisfies the ESP whenever:
The state of the network asymptotically depends only on the driving input signal
Dependencies on the initial conditions are progressively lost In literature it is characterized by equivalence among state
contractivity, state forgetting and input forgetting
18
Conditions for the ESPThe ESP can be guaranteed by controlling the spectral properties of the recurrent weight matrix
Theorem. If the maximum singular value of is less than 1 then the ESN satisfies the ESP.
Sufficient condition for the ESP (contractive dynamics)
Theorem. If the spectral radius of is greater than 1 than the ESN does not satisfy the ESP.
Necessary condition for the ESP (stable dynamics)
recall: the spectral radius is the maximum among the absolute values
of the eigenvalues
19
ESN Initialization: How to setup the Reservoir
20
Elements in 𝐖𝐖𝑖𝑖𝑖𝑖 are selected randomly in [−𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑖𝑖𝑖𝑖, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑖𝑖𝑖𝑖] �𝐖𝐖 initialization procedure:
Start with a randomly generated matrix �𝐖𝐖𝑟𝑟𝑟𝑟𝑖𝑖𝑟𝑟
Scale �𝐖𝐖𝑟𝑟𝑟𝑟𝑖𝑖𝑟𝑟 to meet the condition for the ESP (usually: the necessary one)
ESN Training
21
Discard an initial transient (washout) Run the network on the whole input sequence and collect the
reservoir states
Solve the least squares problem defined by
s
Training the Readout
22
On -line training is not the standard choice for ESNsLeast Mean Squares is typically not suitable
High eigenvalue spread (i.e. large condition number) of X
Recursive Least Squares is more suitable
Off -line training is standard in most applicationsClosed form solution of the least squares problem by direct
methodsMoore -Penrose pseudo-inversion
Possible regularization using random noise
Ridge -regression
𝝀𝝀𝒓𝒓 is a regularization coefficient
Training the Readout/2
23
Multiple readouts for the same reservoir Solving more than 1 task with the same reservoir dynamics
Other choices for the readout: Multi-layer Perceptron Support Vector Machine K-Nearest Neighbor …
ESN – Algorithmic DescriptionInitialization
Win = 2*rand(Nr,Nu) - 1; Win = scale_in * Win;
Wh = 2*rand(Nr,Nu) - 1; Wh = rho * (Wh / max(abs(eig(Wh)));
state = zeros(Nr,1);
Run the reservoir on the input streamfor t = 1:Ntstate = tanh(Win * u(t) + Wh * state);X(:,end+1) = state;
end
Discard the washout
X = X(:,Nwashout+1:end);
Train the readoutWout = Ytarget(:,Nwashout+1:end)*X’*inv(X*X’+lambda_r*eye(Nr));
The ESN is now ready for operation (estimations/predictions)
24
ESN Hyper-parameterization (Model Selection)ESNs are easy to implement, efficient to train, but…
Require a careful selection of several hyper-parametersreservoir dimension
spectral radius input scaling
readout regularization
Reservoir sparsity•
Non• -linearity of reservoir activation functionInput bias•
Length of the • transient/washout
architectural design•
25
ESN hyper-parametrization should be chosen carefully through an appropriate model selection procedure
major hyper-parameters
other hyper-parameters
ESN Major Architectural Variants
26
direct connections from the input to the readout
feedback connections from the output to the reservoir
might affect the stability of the network’s dynamics small values are typically used
ESN for sequence-to-element tasksThe learning problem requires one single output for each input sequence
Granularity of the task is on entire sequences (not on time-steps)example: sequence classification
27
input sequence
output element
time →
𝒙𝒙(𝟏𝟏)
𝒙𝒙(𝟑𝟑)
𝒙𝒙(𝟒𝟒)𝒙𝒙(𝟓𝟓)
𝒙𝒙(𝟐𝟐)
𝒙𝒙(𝒔𝒔) 𝒚𝒚(𝒔𝒔)
Last state
𝒙𝒙 𝑠𝑠 = 𝒙𝒙 𝑁𝑁𝑠𝑠
Mean state
𝒙𝒙 𝑠𝑠 = �1 𝑁𝑁𝑠𝑠 ∑𝑡𝑡=1𝑁𝑁𝑠𝑠 𝒙𝒙(𝑡𝑡)
Sum state
𝒙𝒙 𝑠𝑠 = ∑𝑡𝑡=1𝑁𝑁𝑠𝑠 𝒙𝒙(𝑡𝑡)
𝒔𝒔 = [𝒖𝒖 𝟏𝟏 , … ,𝒖𝒖 𝑵𝑵𝒔𝒔 ]
Leaky Integrator ESN (LI-ESN)
28
Use leaky integrator reservoir units
Apply an exponential moving average to reservoir states
low -pass filter to better handle input signals that change slowly with respect to the sampling frequency
a is the leaking rate parameter 𝑠𝑠 ∈ 0,1controls the speed of reservoir dynamics in reaction to the input
smaller values imply reservoir that react more slowly to the input changesif a = 1 then standard ESN dynamics are obtained
29
Examples of Applications
Applications of ESNs: Examples /1Mackey -Glass time series
30
contraction coefficient reservoir dimension
for α > 16.8 the system has a chaotic attractor
most used values are 17 and 30
ESN performance on the MG17 task
Applications of ESNs: Examples /2 Forecasting of indoor user movements
31
Generalization of predictive performance to unseen environments
https://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+dataDataset is available online on the UCI repository
Deployed WSN: 4 fixed sensors (anchors) & 1 sensor worn by the user (mobile)Predict if the user will change room when she is in position M
Input : received signal strength (RSS) data from the 4 anchors (10 dimensional vector for each time step, noisy data)Target : binary classification (change environmental context or not)
http://fp7rubicon.eu/
Applications of ESNs: Examples /2 Forecasting of indoor user movements – Input data
32
example of the RSS traces gathered from all the 4 anchors in the WSN, for different possible movement paths
Applications of ESNs: Examples /3 (repetita)Human Activity Recognition (HAR) and Localization
33
Input from heterogeneous sensor sources (data fusion) Predicting event occurrence and confidence High accuracy of event recognition/indoor localization
> 90 % on test data Effectiveness in learning a variety of HAR tasks Effectiveness in training on new events
Applications of ESNs: Examples /4 Robotics
34
Indoor localization estimation in critical environment (Stella Maris Hospital)
Precise robot localization estimation using noisy RSSI data (35 cm)
Recalibration in case of environmental alterations or sensor malfunctions
Input: temporal sequences of RSSI values (10 dimensional vector for each time step, noisy data)
Target: temporal sequences of laser-based localization (x,y)
Applications of ESNs: Examples /5Prediction of the Electricity price on the Italian Market
35
Accurate prediction of hourly electricity price (less than 10% MAPE error)
Applications of ESNs: Examples /6 Speech and Text Processing EVALITA 2014 – Emotion recognition track (Sentiment Analysis)
36
Waveform of speech signal Average reservoir state Emotion Class
Challenge : the reservoir encodes the temporal input signals avoiding the need of explicitly resorting to fixed-size feature extraction
Promising performances already in line with the state of the art
7 classes
Applications of ESNs: Examples /7 Human Activity Recognition
37
Classification of human daily activities from RSS data generated by sensors worn by the user
Input: temporal sequences of RSS values (6 dimensional vector for each time step, noisy data)
Target: classification of human activity (bending, cycling , lying, sitting, standing, walking)
Extremely good accuracy ( ≈ 0,99) and F1 score ( ≈ 0,96)
2nd Prize at 2013 EvAAL International Competition
http://archive.ics.uci.edu/ml/datasets/Activity+Recognition+system+based+on+Multisensor+data+fusion+%28AReM%29Dataset is available online on the UCI repository
Applications of ESNs: Examples /8 Health-care monitoring
38
The DOREMI Project Decrease of cognitive decline, malnutrition and sedentariness in aging population Development of a lifestyle management systems exploiting intelligent sensor
networks to realize smart environment for context awareness Activity recognition tasks in the context of health monitoring (e.g. balance quality,
physical activity, socialization, … ) RC networks used for assessing balance abilities and other relevant health/social
parameters/indicators in elderly people
http://www.doremi-fp7.eu/oremi
Project Founded by the European UnionGrant agreement n: 611650Coordination by IFC-CNR
Applications of ESNs: Examples /8
39
Wii Balance Board
BBS
Autonomous Balance Assessment
An unobtrusive automatic system for balance assessment in elderlyBerg Balance Scale (BBS) test : 14 exercises/items (̴30 min.)
Input : stream of pressure data gathered from the 4 corners Nintendo Wii board during the execution of just 1 (over the 14) BBS exercisesTarget : global BBS score of the user (0-56)The use of RNNs allow to automatically exploit the richness of the signal
dynamics oremi
Applications of ESNs: Examples /8
40
Autonomous Balance Assessment Excellent prediction performance using LI-ESNs
Practical example of how performance can be improved in a real-world case By an appropriate design of the task
e.g. inclusion of clinical parameters in input By an appropriate choices for the network design
e.g. by using a weight sharing approach on the input-to-reservoir connections
LI-ESN model Test MAE (BBS points)
Test R
standard 4,80 ± 0,40 0,68
+ weight 4,62 ± 0,30 0,69
LR weight sharing (ws) 4,03 ± 0,13 0,71
ws + weight 3,80 ± 0,17 0,76
Very good comparison
with related models (MLPs, TDNN, RNNs, NARX, …)
with literature approaches
oremi
41
Echo State Property
Echo State PropertyAssumption: Input and state spaces are compact setsA reservoir network whose state update is ruled by
satisfies the ESP if initial conditions are asymptoticallyforgotten
The state dynamics provide “echoes” of the driving inputEssentially, this is a stability condition
42
Echo State Property Why a stable regime is so important? An unstable network exhibits sensitivity to input
perturbations Two slightly different input sequences drive the network into
different states
Good for training The state vectors tend to be more and more linearly separable
(for any given task)
Bad for generalization: overfitting! No generalization ability if a temporal sequence similar to one
in the training set drives the network into completely different states
43
ESP: Sufficient ConditionThe sufficient condition for the ESP analyzes the case of
contractive dynamics of the state transition functionWhatever is the driving input signal, if the system is
contractive then it will exhibit stability
44
Contractivity
45
The reservoir state transition function rules the evolution of the corresponding dynamical system
Def. The reservoir has contractive dynamics whenever its state transition function F is Lipschitz continuous with constant C < 1
Contractivity and the ESP Theorem. If an ESN has a contractive state transition function F, then it satisfies
the Echo State Property Assumption: F is contractive with parameter C < 1
46
Contractivity ESP
goes to 0 as n goes to infinity the ESP holds
Contractivity and Reservoir Initialization If the reservoir is initialized to implement a contractive mapping
than the ESP is guaranteed (in any norm, for any input) Formulation of the sufficient condition for the ESP Assumptions:
Euclidean distance as metric in the state space Reservoir units with tanh activation function (note: squashing nonlinearities
bound the state space)
47
Contractivity, Markovianity and ESNsMarkovianity and contractivity:
Iterated Function Systems, fractal theory, architectural bias of RNNs RNNs initialized with small weights (with contractive state transition function) and bounded state space implement (approximate arbitrarily well) definite memory machines
RNN dynamics (before training) constrained in region of state space with Markovian characterizationContractivity (in any norm) of state transition function implies Echo States ESNs featured by fixed contractive dynamics
Relations with the universality of RC for bounded memory computation (LSMs theory)
Markovian nature: states assumed in correspondence of different input sequences sharing a common suffix are close to each other proportionally to the length of the common suffix
48
Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929
Maass, W., Natschlager, T., Markram, H.: Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation 14 (2002) 2531-2560
Markovianity Input sequences sharing a common suffix drive the system into close states
The states are close to each other proportionally to the length of the common suffix
Inherent ability to discriminate among different input histories in a suffix-based fashion without training of reservoir weights
Why/When do ESNs work? the target should match the Markovian property of the reservoir state space
organization
49
Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.
Markovianity: A practical case
Representation of the ESN state space (by PCA) The network is driven by a symbolic sequence The reservoir state spaces naturally organizes in a fractal way
50
σ = 0.3
1st PC: last input symbol
2nd PC: next-to-last input symbol
eb hj a a c
ea hi e c c
jg je e g f
jg he e g h
Gallicchio, C., Micheli, A. (2010). A Markovian characterization of redundancy in echo state networks by PCA. In Proceedings of ESANN 2010.
ESP: Necessary Condition Investigating the stability of reservoir dynamics from a
dynamical system perspective Theorem. If an ESN has unstable dynamics around the zero
state and the zero sequence is an admissible input, then the ESP is not satisfied.
Study the stability of the linearized dynamical system
around the 0 state and for 0 input:
51
ESP: Necessary Condition The linearized system now reads:
0 is a fixed point. Is it stable?
If 𝜌𝜌 �𝐖𝐖 < 1 the fixed point is stable Otherwise: if we start from a state near 0 and we drive the
network with the null sequence we do not end un in 0 The null sequence is a counter-example: the ESP does not hold!
52
60
Reservoir Computing for Structured Domains
Learning in Structured Domains
61
In many real• -world application domains the information of interest can be naturally represented by the means of structured data representations.The problems of interest can be modeled as regression or classification •tasks on structured domains.
Sequences Trees Graphs
QSPR analysis of Alkanes
Boiling Point
Predictive Toxicology ChallengeMG -Chaotic Time Series Prediction
{−1, +1}
Learning in Structured Domains
62
Learning in domains of trees and graphs opens up a wide range of research directions:• Theoretical• Experimental Analysis in interdisciplinary areas
• QSAR/QSPR• Computational Toxicology, Cheminformatics• Social and Web information Processing• Document processing• Parallel Computation• …
ProblemsLearning in Structured Domains entails a number of open research problems, mainly related to the increasing complexity of the data domains to treat
• Efficiency• Generalization of class of data structures supported• Adaptivity• Generalization ability
Tree Echo State Networks
63
Extend the applicability of Reservoir Computing approaches to tree structured data
Extremely efficient approach for modeling Recursive Neural Networks (RecNN)
Architectural and experimental baseline for RecNNs with fully trained recurrent weights
Gallicchio, C., Micheli, A. (2013). Tree echo state networks. Neurocomputing, 101, 319-337.
TreeESN: ReservoirLarge, sparsely connected, untrained layer of non -linear recursive unitsInput driven dynamical system on discrete tree structures
64
TreeESN: State mapping and Readout State Mapping function for tree-to-element tasks one single output vector (unstructured) is required for each
input tree example: document classification
Readout computation and training is as in the case of standard Reservoir Computing approaches tree-to-tree (isomorphic transductions) tree-to-element
65
( )x t
Root State Mapping
Mean State Mapping
TreeESN: Echo State PropertyThe recursive reservoir dynamics can be left untrained
provided that a contractive property is satisfiedEcho State Property for trees
for two any initial states, the state computed for the root of the input tree should converge for increasing height of the tree
Contractive state transition function
Sufficient condition for the ESP for trees
being contractiv e
Necessary condition
being stable66
sharing a common suffix of height h
TreeESN: MarkovianityContractivity of state dynamics entails MarkovianityMarkovian characterization extended to tree processing
67
• The reservoir of TreeESN is able to discriminate among input trees in a Markovian tree suffix –based way without any training
• Suitable for tasks with target functions compatible with Markovianity
TreeESN: Efficiency
68
Computational ComplexityExtremely efficient RC approach: only the linear readout parameters are trained
Encoding Process
For each tree t
• Scales linearly with the number of nodes and the reservoir dimension• The same cost for training and test• Compares well with state of art methods for trees:
• RecNNs: extra cost (time + memory) for gradient computations• Kernel methods: higher cost of encoding (e.g. Quadratic in PT kernels)
number of nodes max degree number of reservoir unitsdegree of connectivity
Output Computation• Depends on the method used (e.g. Direct using SVD or iterative)• The cost of training the linear TreeESN readout is generally inferior to the cost
of training MLPs or SVMs (used in RecNNs and Kernels)
TreeESN: State Space and Applications
69
Root state mapping
Mean state mapping
QSPR Analysis of Alkanes
Predict the boiling point of alkanes
Target is related to global properties of the molecules (num of carbons + branching pattern): non
Markovian
Graph Echo State NetworksExtend the applicability of Reservoir Computing to general
graphsTwo major advantages:
efficiency – training on graph data is computationally intensive!large class of data – naturally enable processing of cyclic and undirected structures
70
GraphESN: Reservoir Input-driven dynamical system on discrete graphs The same reservoir architecture is applied to all the
vertices in the structure
Stability of the state update ensures a solution even in case of cyclic dependencies among state variables
71
τ
τττ
τ
τ
ττττ
v
τ
τ
τ ττ̂Standard ESN GraphESN
GraphESN: Echo State Property A stability constraint is required to achieve usable
dynamics Foundational idea: resort to contractive dynamics Contractive dynamics ensure the convergence of the encoding
process (Banach Th.) Enables an iterative computation of the encoding process
Sufficient condition Necessary condition
72
GraphESN: Markovianity
73
e.g. d = 2
Contractivity of the state transition function implies reservoir •dynamics with Markovian flavourSuffix: the concept is extended to the set of d• -neighbors of a vertex v, i.e. 𝑁𝑁 𝑟𝑟 (𝑣𝑣)
such that
Markovian Characterization of GraphESN Dynamics
74
Deep Learning and Reservoir Computing
Deep Neural Networks
75
Deep Learning is an attractive research area Hierarchy of many non-linear models Ability to learn data representations at different (higher) levels
of abstraction
Deep Neural Networks (deepNNs) Feed-forward hierarchy of multiple hidden layers of non-linear
units
Impressive performance in real-world problems
Deep Recurrent Neural NetworksExtension of the deep learning methodology to temporal processing.Aim at naturally capture temporal feature representations at different time-scales Text, speech, language processing Multi-layered processing of temporal information with feedbacks
has a strong biological plausibility
The analysis of deep RNNs is still young Stability: characterize the dynamics of hierarchically organized
recurrent models Deep Reservoir Computing: Investigate the actual role of layering in
deep recurrent architectures
76
Deep RNN Architecture: The role of Layering
77
Constraints to the architecture of a fully connected RNN, by: Removing connection from
input to higher layers Removing connections from
higher layers to lower ones Removing connections
to layers at levels higher than +1
Less weights to store than a fully connected RNN
Deep Echo State Network
What is the intrinsic role of layering in recurrent architectures?
Develop novel efficient approaches to exploit Multiple time-scales
representations Extreme efficiency of training
78
C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017
untrained
DeepESN: Architecture and Dynamics
79
first layer
l-th layer (l>1)
DeepESN: Hierarchical Temporal Features
80
Structured representation of temporal data through the deep architecture
Empirical Investigations Effects of input perturbations
lasts longer in higher layers Multiple time-scales
representation Ordered along the network’s
hierarchy
C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017
DeepESN: Hierarchical Temporal Features
81
Structured representation of temporal data through the deep architecture
Frequency Analysis Diversified magnitudes of FFT components
Multiple frequency representation
Ordered along the network’s hierarchy
Higher layers focus on lower frequenciesC. Gallicchio, A. Micheli, L. Pedrelli, WIRN 2017
layer 1 layer 4
layer 7 layer 10
DeepESN: Hierarchical Temporal Features
82
Structured representation of temporal data through the deep architecture
Theoretical Analysis Higher layers intrinsically implement less contractive dynamics
Deeper networks naturally develop richer dynamics, closer to the edge of chaos
Convenient way of architectural setup
Gallicchio, Micheli. Cognitive Computation (2017).
Gallicchio, Micheli, Silvestri. Neurocomputing 2018.
DeepESN: Output Computation
83
The readout can modulate the temporal features at different layers
Cloosing the loops: Deep Tree Echo State Networks
84
Deep Tree Echo State Networks Untrained multi-layered
recursive neural networkGallicchio, Micheli, IJCNN, 2018.
Deep Tree Echo State Networks: Unfolding
85
Deep Tree Echo State Networks: Advantages
86
Effective in applications
Extremely efficient Layered recursive architecture Shallow case
Conclusions Reservoir Computing: paradigm for efficient modeling of
RNNs Reservoir: non-linear dynamic component, untrained after
contractive initialization Readout: linear feed-forward component, trained Easy to implement, fast to train Markovian flavour of reservoir state dynamics Successful applications Recent extensions toward: Deep Learning architecture Structured Domains
87
Research Issues Optimization of reservoirs: supervised or unsupervised reservoir
adaptation (e.g. Intrinsic Plasticity) Architectural Studies: e.g. Minimum complexity ESNs, 𝜑𝜑-ESNs,
orthogonal ESNs, … Physical realizations: Photonic Reservoir Computing Reservoir Computing for learning in Structured Domains Tree Echo State Networks Graph Echo State Networks
Deep Reservoir Computing Deep Echo State Networks
Applications, applications, applications…
88
References Jaeger H. The “echo state” approach to analysing and training recurrent
neural networks - with an erratum note. Tech. rep. GMD - German National Research Institute for Computer Science, Tech. Rep. 2001.
Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science. 2004;304(5667):78–80.
Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.
Lukoševičius, Mantas, and Herbert Jaeger. "Reservoir computing approaches to recurrent neural network training." Computer Science Review 3.3 (2009): 127-149.
Lukoševičius, Mantas. "A practical guide to applying echo state networks." Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012. 659-686.
89