asymptotic fisher memory of randomized linear symmetric

19
University of Birmingham Asymptotic Fisher Memory of Randomized Linear Symmetric Echo State Networks Tino, Peter DOI: 10.1016/j.neucom.2017.11.076 License: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) Document Version Peer reviewed version Citation for published version (Harvard): Tino, P 2018, 'Asymptotic Fisher Memory of Randomized Linear Symmetric Echo State Networks', Neurocomputing, vol. 298, pp. 4-8. https://doi.org/10.1016/j.neucom.2017.11.076 Link to publication on Research at Birmingham portal General rights Unless a licence is specified above, all rights (including copyright and moral rights) in this document are retained by the authors and/or the copyright holders. The express permission of the copyright holder must be obtained for any use of this material other than for purposes permitted by law. • Users may freely distribute the URL that is used to identify this publication. • Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of private study or non-commercial research. • User may use extracts from the document in line with the concept of ‘fair dealing’ under the Copyright, Designs and Patents Act 1988 (?) • Users may not further distribute the material nor use it for the purposes of commercial gain. Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document. When citing, please reference the published version. Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive. If you believe that this is the case for this document, please contact [email protected] providing details and we will remove access to the work immediately and investigate. Download date: 17. Dec. 2021

Upload: others

Post on 18-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

University of Birmingham

Asymptotic Fisher Memory of Randomized LinearSymmetric Echo State NetworksTino, Peter

DOI:10.1016/j.neucom.2017.11.076

License:Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Document VersionPeer reviewed version

Citation for published version (Harvard):Tino, P 2018, 'Asymptotic Fisher Memory of Randomized Linear Symmetric Echo State Networks',Neurocomputing, vol. 298, pp. 4-8. https://doi.org/10.1016/j.neucom.2017.11.076

Link to publication on Research at Birmingham portal

General rightsUnless a licence is specified above, all rights (including copyright and moral rights) in this document are retained by the authors and/or thecopyright holders. The express permission of the copyright holder must be obtained for any use of this material other than for purposespermitted by law.

•Users may freely distribute the URL that is used to identify this publication.•Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of privatestudy or non-commercial research.•User may use extracts from the document in line with the concept of ‘fair dealing’ under the Copyright, Designs and Patents Act 1988 (?)•Users may not further distribute the material nor use it for the purposes of commercial gain.

Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document.

When citing, please reference the published version.

Take down policyWhile the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has beenuploaded in error or has been deemed to be commercially or otherwise sensitive.

If you believe that this is the case for this document, please contact [email protected] providing details and we will remove access tothe work immediately and investigate.

Download date: 17. Dec. 2021

Accepted Manuscript

Asymptotic Fisher Memory of Randomized Linear Symmetric EchoState Networks

Peter Tino

PII: S0925-2312(18)30220-0DOI: 10.1016/j.neucom.2017.11.076Reference: NEUCOM 19364

To appear in: Neurocomputing

Received date: 14 July 2017Revised date: 22 October 2017Accepted date: 26 November 2017

Please cite this article as: Peter Tino, Asymptotic Fisher Memory of Randomized Linear SymmetricEcho State Networks, Neurocomputing (2018), doi: 10.1016/j.neucom.2017.11.076

This is a PDF file of an unedited manuscript that has been accepted for publication. As a serviceto our customers we are providing this early version of the manuscript. The manuscript will undergocopyediting, typesetting, and review of the resulting proof before it is published in its final form. Pleasenote that during the production process errors may be discovered which could affect the content, andall legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Asymptotic Fisher Memory of Randomized Linear

Symmetric Echo State Networks

Peter Tino

School of Computer Science, The University of BirminghamBirmingham B15 2TT, United Kingdom

E-mail: P.T [email protected]

Abstract

We study asymptotic properties of Fisher memory of linear Echo State Net-works with randomized symmetric state space coupling. In particular, tworeservoir constructions are considered: (1) More direct dynamic couplingconstruction using a class of Wigner matrices and (2) positive semi-definitedynamic coupling obtained as a product of unconstrained stochastic matri-ces. We show that the maximal Fisher memory is achieved when the input-to-state coupling is collinear with the dominant eigenvector of the reservoircoupling matrix. In the case of Wigner reservoirs we show that as the systemsize grows, the contribution to the Fisher memory of self-coupling of reservoirunits is negligible. We also prove that when the input-to-state coupling iscollinear with the sum of eigenvectors of the state space coupling, the ex-pected normalized memory is four and eight time smaller than the maximalmemory value for the Wigner and product constructions, respectively.

Keywords: Fisher memory of dynamical systems, Recurrent neuralnetwork, Echo state network, Reservoir Computing

1. Introduction

Input driven dynamical systems play a prominent role in machine learningas models applied to time series data, e.g. [2, 10, 21, 15]. There has beena lively research activity on formulating and assessing different aspects ofcomputational power and information processing in such systems (see e.g.[5, 16]). For example, tools of information theory have been used to assessinformation storage or transfer within systems of this kind [13, 14, 17, 3].

Preprint submitted to Neurocomputing February 20, 2018

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Alternatively, dynamical systems have been assessed as feature generatorsfor machine learning algorithms in terms of class separability (in sequenceclassification problems) or learnability [12].

To specifically characterize capability of input-driven dynamical systemsto keep in their state-space information about past inputs, several memoryquantifiers were proposed, for example short term memory capacity [9] andFisher memory curve [6]. Even though those two measures have been devel-oped from completely different perspectives, deep connections exist betweenthem [20]. The concept of memory capacity, originally developed for uni-variate input streams, was generalized to multivariate inputs in [8]. Couilletet al. [4] rigorously studied mean-square error of linear dynamical systemsused as dynamical filters in regression tasks and suggested memory quantitiesthat generalize the short term memory capacity and Fisher memory curvemeasures. Finally, Ganguli and Sompolinski [7] showed an interesting con-nection between memory in dynamical systems and their capacity to performdynamical compressed sensing of past inputs.

In this contribution we concentrate ons Fisher memory of linear dynam-ical systems with symmetric coupling. In Echo State Networks (ESN) [15]large state space dimensionalities with random dynamical couplings are typ-ically used and linear readout from the state space forms the only trainablepart of the model. It is therefore important to characterize important largescale properties of Fisher memory in such systems (as the state space dimen-sionality grows) and study optimal settings of input-to-state couplings thatmaximize the memory. In particular, we rigorously study Fisher memoryof two subclasses of linear input driven dynamical systems with symmetricdynamical coupling - a direct dynamic coupling construction using a class ofWigner matrices (section 3) and a positive semi-definite dynamic couplingobtained as a product of unconstrained stochastic matrices (section 4).

2. Fisher memory curve of linear dynamical systems

We consider linear input driven dynamical systems with N -dimensionalstate space and univariate inputs and outputs with randomized symmetricdynamic coupling.

In the ESN metaphor, the state dimensions correspond to reservoir unitscoupled to the input s(t) and output y(t) through N -dimensional weightvectors v ∈ RN and r ∈ RN , respectively. Denoting the state vector at time

2

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

t by x(t) ∈ RN , the dynamical system (reservoir activations) evolves as

x(t) = vs(t) + Wx(t− 1) + z(t), (1)

where W ∈ RN×N is a N×N weight matrix providing the dynamical couplingand z(t) are zero-mean noise terms. Parameters r of the adaptive linear read-out, y(t) = rTx(t), are typically trained (offline or online) by minimizing the(normalized) mean square error between the targets and reservoir readoutsy(t). For our analysis, however, the readout part of the ESN architecture isnot needed.

In ESN, the elements of W and v are fixed prior to training, often atrandom, with entries drawn from a distribution symmetric with respect to theorigin. The reservoir connection matrix W is typically scaled to a prescribedspectral radius < 1, although in this study we assume that the parametersof the distribution over W are set so that asymptotically, almost surely, Wis a contractive linear operator.

In [6] Ganguli, Huh and Sompolinsky proposed a particular way of quan-tifying the amount of memory preserved in linear input driven dynamicalsystems corrupted by a memoryless Gaussian i.i.d dynamic noise1 z(t). Inparticular, z(t) is zero mean with co-variance εI, ε > 0, where I is the N ×Nidentity matrix. Under such dynamic noise, given an input driving streams(..t) = ... s(t− 2) s(t− 1) s(t), the input-conditional state distribution

p(x(t)| ... s(t− 2) s(t− 1) s(t))

is a Gaussian with covariance [6]

C = ε∞∑

`=0

W`(WT )`. (2)

Sensitivities of p(x(t)| s(..t)) with respect to small perturbations in the inputdriving stream s(..t) (parameters of the dynamical system remain fixed) are

1As customary in the dynamical systems literature, we distinguish between the ”obser-vational” and ”dynamic” noise. Observational noise refers to the noise applied to readoutsfrom the state space in the process of their measurement. This noise does not corrupt theunderlying dynamics of the system. On the other hand, dynamic noise corrupts the systemdynamics in the state space. The term dynamic noise does not in this case refer to thepossibility of its distribution changing in time.

3

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

collected in the Fisher memory matrix F with elements

Fk,l(s(..t)) = −Ep(x(t)| s(..t))[

∂2

∂s(t− k)∂s(t− l) log p(x(t)| s(..t))]

and its diagonal elements JN(k) = Fk,k(s(..t)) quantify the information thatthe state distribution p(x(t)| s(..t)) retains about a change (e.g. a pulse)entering the network k > 0 time steps in the past. The collection of terms{JN(k)}∞k=0 was termed Fisher memory curve (FMC) and evaluated to [6]

JN(k; W,v) = vT (WT )kC−1Wkv, (3)

where in the notation JN(k; W,v) we made explicit the dependence of FMCon the dynamic and input and couplings W and v, respectively.

Analogously to memory capacity of dynamical systems [9], we extend theFisher memory curve to the global memory quantification,

JN(WN ,v) =∞∑

k=1

JN(k; WN ,v).

We will refer to JN(WN ,v) as Fisher memory of the underlying dynamicalsystem. Obviously, increasing state space dimension N will increase theamount of memory that can be usefully captured by the dynamical system(1). To remove this bias, we introduce a new quantity, normalized Fishermemory, which measures the amount of memory realisable by the dynamicalsystem per state space dimension:

JN(WN ,v) =1

NJN(WN ,v).

In the following we study asymptotic properties of the normalized Fishermemory as the state space dimensionality grows and ask what kind of inputcoupling v is needed to maximize its expectation. Again, it is importantto realize that as the state space dimensionality N grows, so does the in-put weight dimensionality. Keeping the input weight norm constant whileincreasing the state space dimensionality would result in diminishing indi-vidual weights. To normalize the scales, so that asymptotic statements canbe made, we will require that the input weights live on (N − 1)-dimensionalhypersphere, v ∈ SN−1(

√N), where for r > 0,

SN−1(r) = {v ∈ RN | ‖v‖2 = r}.

4

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

3. Wigner ESN

Theory of random matrices has undergone considerable development, seee.g. [19]. In this contribution we will study dynamical systems with random-ized coupling constrained to the class of Wigner matrices (e.g. [1]). Let QN

be a random symmetric N ×N matrix with ”upper triangular” off-diagonalelements Qi,j, 1 ≤ i < j ≤ N distributed i.i.d. with zero mean and fi-nite moments - in particular, of variance σ2

o > 0. Diagonal elements Qi,i,1 ≤ i ≤ N of QN are distributed i.i.d. with a zero-mean distribution of finitemoments and variance σ2

d > 0. The elements below the diagonal are copiesof their symmetric counterparts: for 1 ≤ j < i ≤ N , Qi,j = Qj,i. Asymptoticproperties of such matrices have been intensively studied, in particular theconvergence of eigenvalues, as N →∞. It can be shown that in the generalcase, scaling down of random matrices is necessary to ensure convergence oftheir spectral properties [1]:

WN =1√N

QN .

We will refer to ESN with dynamical coupling WN as Wigner Echo StateNetworks. We are now ready to state the first result concerning maximalFisher memory of Wigner ESNs.

Theorem 1: Consider a sequence of Wigner dynamical systems (1) withcouplings {WN}N>1. The maximum normalized Fisher memory is attainedwhen for every realization of Wigner coupling WN , the input weights v arecollinear with the dominant eigenvector2 of WN . In that case, as N → ∞,almost surely,

JN(WN ,v)→ 4

εσ2o .

Proof: For a fixed N , let WN be a realization of Wigner coupling. SinceWN is symmetric, it can be diagonalised,

WN = UNΛNUTN , ΛN = diag(λ1, λ2, ..., λN). (4)

Without loss of generality assume λ1 ≥ λ2 ≥ ...λN . Columns of UN areeigenvectors {ui}Ni=1 of WN , forming an orthonormal basis of RN . Let v be

2the eigenvector corresponding to the maximal eigenvalue

5

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

the expression of input weights v in this basis, i.e. v = UTNv. It has been

shown in [20] that for symmetric dynamic couplings,

JN(k; WN ,v) =1

ε

N∑

i=1

v2i λ2ki (1− λ2i ).

We therefore have

ε · JN(WN ,v) =∞∑

k=1

N∑

i=1

v2i λ2ki (1− λ2i )

=N∑

i=1

v2i (1− λ2i )∞∑

k=1

λ2ki

=N∑

i=1

v2i λ2i . (5)

Letting N−1/2v be the dominant eigenvector u1 of WN results in

v =√N UT

Nu1 =√N e1,

where ei the i-th standard basis vector, i.e. ei ∈ RN is the vector of 0’s,except for the value 1 at index i. We thus have

ε · JN(WN ,u1) = N λ21

and hence ε · JN(WN ,u1) = λ21. Now, the maximal eigenvalue of Wigner ma-trices converges to 2σo almost surely [1], giving the almost sure convergenceof JN(WN ,u1) to (4σ2

o)/ε.To show that collinearity of input weights v ∈ SN−1(

√N) with dominant

eigenvector of WN is the optimal setting, we note that since v expresses v inanother orthonormal basis UN , the norm is preserved, ‖v‖2 = ‖v‖2. Hence,v ∈ SN−1(

√N). In line with eq. (5) we therefore consider the following

optimization problem:

maxq∈SN−1(1)

N ·N∑

i=1

q2i λ2i .

Reparametrization ai = q2i , bi = λ2i and ignoring constant scaling leads to

maxa∈ZN−1

bTa, (6)

6

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

which is a linear optimization problem on simplex

ZN−1 = {a ∈ [0, 1]N | ‖a‖1 = 1}.

Let the largest element of b be bi∗ , i.e. i∗ = arg maxi bi. Then the quantityin (6) is maximized when a = ei∗ , in other words, ai∗ = 1 and aj = 0, j 6= i∗.In our case i∗ = 1 and so the non-zero element of a is a1 = q1 = 1. It followsthat v21 = N and vj = 0, j = 2, 3, ..., N . This directly implies v ∈ SN−1 andcollinear with u1. �

The last result imposes an asymptotic upper bound on normalized Fishermemory of Wigner ESNs. Obviously, JN(WN ,v) can be made vanishinglysmall by making the input weights v collinear with the least significant eigen-vector of WN (see semicircular law of eigenvalue distribution for Wigner ma-trices [1]). In the following we ask to what degree will the Fisher memoryof (1) degrade if instead of the dominant eigenvector the input weights aremade collinear with the sum of eigenvectors of WN .

Theorem 2: Consider a sequence of Wigner dynamical systems (1) withcouplings {WN}N>1. For every realization of Wigner coupling WN , let theinput weights v be collinear with the sum of eigenvectors of WN . Then, asN →∞, for the expected normalized Fisher memory we have,

EWN[JN(WN ,v)]→ 1

εσ2o .

Proof: In this case, v =∑N

i=1 ui ∈ SN−1(√N), vi = 1, i = 1, 2, ..., N . By

(5),

‖WN‖2F =N∑

i,j=1

W 2N,ij

=N∑

i=1

λ2i

= ε · JN(WN ,v),

7

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

where ‖ · ‖F is the Frobenius norm. This implies

ε · EWN[JN(WN ,v)] =

1

N

N∑

i,j=1

E[W 2N,ij]

=1

Nσ2d +

N − 1

Nσ2o

and the result follows from sending N →∞. �

4. Symmetric Positive Semi-definite Reservoirs

In the previous section, the symmetry of randomized dynamic reservoirswas imposed directly by stipulating that the matrix elements Qi,j and Qj,i

must be equal. This is achieved by randomly generating Qi,j above thediagonal and then copying Qi,j to their symmetric counterparts Qj,i belowthe diagonal (Wigner random matrices). In this section we ask whether thenature of the results derived for Wigner ESN changes if another, less directconstruction of randomized symmetric reservoirs is used. In particular, wegenerate QN as

QN = YTNYN , (7)

where YN is a random N × N matrix with elements Yi,j, 1 ≤ i, j ≤ N ,distributed i.i.d. with zero mean and finite moments - in particular, of vari-ance σ2

Y > 0. As before, to ensure convergence of spectral properties ofrandom matrices YN , a scaling factor of N−1/2 needs to be applied to YN

[1], resulting in:

WN =1

NQN . (8)

Note that each realization of YN yields a positive semi-definite matrices QN

and WN .Recall that to show that the optimal (i.e. maximizing Fisher memory)

setting of input weights (up to necessary scaling) is the leading eigenvectorof WN , we only needed symmetry of WN . Considering that the individualitems of random matrix are generated i.i.d. from a 0-mean distribution withvariance σ2

Y , the question is to what extent does the particular randomized

8

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

construction of WN matter when asymptotic properties of Fisher memoryare considered. We have the following result:

Theorem 3: Consider a sequence of dynamical systems (1) with randomizedpositive semidefinite couplings {WN}N>1 (7)-(8). If the input weights v arecollinear with the dominant eigenvector of WN , as N →∞, almost surely,

JN(WN ,v)→ 16

εσ4Y .

Proof: Let smax(YN) be the maximal singular value of the N ×N matrixYN . As N →∞,

1√Nsmax(YN)→ 2σY

almost surely [18]. This implies that the maximum eigenvalue λmax(WN) ofWN approaches 4σ2

Y .We have shown in the proof of Theorem 1 that if the input weight is

collinear with the dominant eigenvector of WN , we have ε · JN = λ2max(WN).Hence, as N →∞, JN(WN ,v) converges to 16 σ4

Y /ε almost surely. �

Theorem 3 demonstrates that the asymptotic properties of Fisher memorytranslate directly to the case of symmetric randomized dynamical couplingsconstructed as products of unconstrained random matrices. Of course, sinceWN is now constructed as a product of random matrices, it is natural thatthe asymptotic Fisher memory is expressed as a square of the expression inTheorem 1. Indeed, consider an alternative construction of the symmetricreservoir:

QN =[YTNYN

] 12 . (9)

Each realization of YN yields a positive semi-definite matrix YTNYN that has

a unique square root - a realization of QN . Again, to ensure convergence ofspectral properties of random matrices YN , a scaling factor of N−1/2 needsto be applied:

WN =1√N

QN , (10)

Now, since N−1/2smax(YN)→ 2σY almost surely, as N →∞,

1

Nλmax(Y

TNYN)→ 4σ2

Y .

9

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Hence, the maximum eigenvalue λmax(WN) of WN approaches 2σY . It fol-lows that if the input weight is collinear with the dominant eigenvector ofWN , we have ε · JN = λ2max(WN) and as N → ∞, JN(WN ,v) convergesto 4 σ2

Y /ε almost surely. This is in direct correspondence with the result ofTheorem 1 for Wigner ESN. In this case, irrespective of whether the sym-metry of randomized dynamical coupling is obtained directly (Wigner ESN),or indirectly as a product of unconstrained random matrices, the maximalnormalized Fisher memory approaches 4

εσ2 and is determined solely by the

second moment of the zero-mean random variables generating the strengthof dynamic couplings within the reservoir.

We now turn our attention to the case of input weight vector collinearwith the sum of eigenvectors of WN . We have shown that, compared tothe maximum Fisher memory (input weight vector collinear with the leadingeigenvector of WN), this results in reduced asymptotic Fisher memory bya factor of 4 (see Theorems 1 and 2). Is this a property of the particularrandomized construction of symmetric dynamic coupling as a Wigner matrix,or does it reflect a more general tendency?

Theorem 4: Consider a sequence of Wigner dynamical systems (1) withcouplings {WN}N>1 (7)-(8). Assume that the entries of YN are i.i.d. gen-erated from a symmetric distribution with 0-mean, variance σ2

Y and finitefirst four moments. For every realization of dynamic coupling WN , let theinput weights v be collinear with the sum of eigenvectors of WN . Then, asN →∞, for the expected normalized Fisher memory we have,

EWN[JN(WN ,v)]→ 2

εσ4Y .

Proof: In the proof of Theorem 2 we have shown that if the input weightvector v is collinear with the sum of eigenvectors of WN , then

ε · JN(WN ,v) =N∑

i=1

λ2i , (11)

where λi’s are eigenvalues of WN . Furthermore,

N∑

i=1

λ2i = trace(W2N), (12)

10

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

and so the expected Fisher memory can be evaluated as

EYN[JN(WN ,v)] =

1

εEYN

[trace(W2N)],

where

W2N =

1

N2YTNYNYT

NYN .

Fix a positive semi-definite N ×N matrix ΓN with eigenvalues γ1, ..., γN .Kaban showed ([11], Lemma 2) that the expectation of YT

NYNΓNYTNYN

reads

EYN[YT

NYNΓNYTNYN ] = Nσ4

Y

[(N + 1)ΓN + trace(ΓN)IN + E

N∑

i=1

γiA(i)

],

where IN is N × N identity matrix, E is excess kurtosis of the distributiongenerating elements of YN and A(i) are N ×N diagonal matrices with j-thdiagonal elements equal to

∑Na=1 u

2a,iu

2a,j and ua,i is the a-th item of the i-th

eigenvector of ΓN .In our case ΓN = IN with eigenvectors ei standard basis (all elements

equal to 0 and the i-th element 1) and eigenvalues γ1 = γ2 = ... = γN = 1.Note that (ei)a = δ(i, a), where δ is the Kronecker delta. Hence,

A(i)j,j =

N∑

a=1

δ(i, a)δ(j, a) = δ(i, j)

and so A(i) = diag(ei). It follows that

N∑

i=1

γiA(i) = IN .

We can now evaluate

N2 EYN[W2

N ] = EYN[YT

NYNYTNYN ]

= EYN[YT

NYNINYTNYN ]

= Nσ4Y [(N + 1)IN +NIN + EIN ]

= N(2N + 1 + E)σ4Y IN . (13)

11

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

It follows that

EYN[W2

N ] = σ4Y

(2 +

1 + EN

)IN .

We thus have

EYN[trace(W2

N)] = σ4Y (2N + 1 + E)

From (11) and (12) we conclude that as N →∞, the normalized expectedFisher memory per dimension,

JN(WN ,v) =1

N · εEYN[trace(W2

N)],

converges to 2σ4Y /ε. �

This result offers an interesting insight into the asymptotic behavior ofFisher memory for symmetric dynamic reservoirs. Whereas in the case ofdirect Wigner construction, when changing the input weight vector from theoptimal setting of the leading eigenvector of dynamic coupling WN to thesum of its eigenvectors, the Fisher memory drops by a factor of 4, in the caseof symmetric positive semi-definite WN obtained as a product of randommatrices, the drop in Fisher memory is twice as large - by a factor of 8 (seeTheorems 3 and 4). A deeper investigation of this phenomenon is beyondthe scope of this study and here we can only speculate. It is possible that thefaster drop in the Fisher memory reflects the fact that while the diagonal andupper-off-diagonal coupling weights in Wigner ESN are truly independent,in the positive semi-definite construction a more rigid dependency structureis imposed.

5. Discussion and Conclusion

We have rigorously studied Fisher memory of two subclasses of linearinput driven dynamical systems. In order to study how memory propertiesof such systems scale with the system size, we investigated Fisher memorynormalized per state space dimension.

The first subclass has a dynamical coupling formed by Wigner randommatrices. Such systems can be viewed as Echo State Networks with Wignerreservoir coupling. Several interesting findings were derived, in particular:

12

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

1. as the system size grows, the contribution of self-coupling of the states(self-loops on reservoir units in ESN) to the normalized Fisher memoryis negligible;

2. the maximal normalized Fisher memory is achieved when the input-to-state coupling is collinear with the dominant eigenvector of the statespace coupling matrix; and

3. when the input-to-state coupling is collinear with the sum of eigen-vectors of the state space coupling, the expected normalized memoryis four time smaller than the maximal memory value achieved whencollinearity with the dominant eigenvector only is employed.

Dynamical coupling of the second subclass of randomized symmetricreservoirs constructs positive semi-definite dynamical couplings as productsof unconstrained random matrices. Interestingly enough, while in the case ofWigner reservoirs, when changing the input weight vector from the optimalsetting of the leading eigenvector of dynamic coupling to the sum of its eigen-vectors, the Fisher memory drops by a factor of 4, in the case of symmetricpositive semi-definite couplings obtained as a product of random matrices,the drop in the Fisher memory is twice as large.

Note that in the case of positive semi-definite dynamical couplings, weno longer have the possibility of setting variances of self-loop weights inde-pendently of the variances of the inter-neuron connections. Hence, direct in-vestigation of the influence of self-loops on Fisher memory in the asymptoticregime is not possible. However, one suspects that, as in the case of Wignerreservoirs, the main contributions to memory come from the inter-neuroncouplings and for large reservoirs, memories with, or without self-loops willnot differ significantly. Nevertheless, showing this rigorously will requireconsidering more complex reservoir weight generation mechanisms that arebeyond the scope of this study.

One can legitimately ask whether the restriction to linear and symmet-ric reservoirs does not severely limit the scope of this study with respectto reservoirs used in practice. First, in general, memory quantifications ofdynamical systems, such as memory capacity or Fisher memory, quantifycapabilities of dynamical systems that are not necessarily directly related totheir usuabilty as universal dynamical filters when performing general pre-dictive tasks. Second, of course, as in many other areas of science, deepertheory with closed-form expressions for quantities of interests are possible(at least initially) only for sufficiently constrained cases. This is the case

13

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

here - the first study of asymptotic behavior of Fisher memory. The linearityand symmetry allowed us to use theory of random matrices and rotate theco-ordinate axis to the natural diagonalised eigen-system.

This study can nevertheless bring some substance to the debate on the”optimal” reservoir construction. While reservoir architecture should reflectthe task to be tackled, it is surprising how universal randomized reservoirscan be for a wide variety of tasks. What aspects of dynamic reservoirs arecontributing to this universality? If one believes that memory properties canplay a role in making dynamical systems good general-purpose dynamicalfilters, then this study suggests that independent generation of individualreservoir coupling weights can be preferable to (perhaps more sophisticated)dependencies between the weights. It also suggests that, especially for largerreservoirs, self-couplings are much less important than cross-neuron commu-nications. A systematic empirical and theoretical investigations along thoselines, possibly making connections to neuroscience is a matter for future re-search.

Acknowledgement:This work was supported by the EPSRC grant “Personalised Medicine ThroughLearning in the Model Space” (grant number EP/L000296/1).

References

[1] G. Anderson, A. Guionnet, and O. Zeitouni. An Introduction to RandomMatrices (Cambridge Studies in Advanced Mathematics). CambridgeUniversity Press, 2010.

[2] Witali Aswolinskiy, Felix Reinhart, and Jochen J. Steil. Modelling ofParameterized Processes via Regression in the Model Space. In Proceed-ings of 24th European Symposium on Artificial Neural Networks, pages53–58, 2016.

[3] Terry Bossomaier, Lionel Barnett, Michael Harr, and Joseph T. Lizier.An Introduction to Transfer Entropy: Information Flow in Complex Sys-tems. Springer Publishing Company, Incorporated, 1st edition, 2016.

[4] Romain Couillet, Gilles Wainrib, Harry Sevi, and Hafiz Tiomoko Ali.The asymptotic performance of linear echo state neural networks. Jour-nal of Machine Learning Research, 17(178):1–35, 2016.

14

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

[5] J Dambre, David Verstraeten, Benjamin Schrauwen, and Serge Massar.Information processing capacity of dynamical systems. Scientific reports,2:514, 07 2012.

[6] S. Ganguli, D. Huh, and H. Sompolinsky. Memory traces in dynamicalsystems. Proceedings of the National Academy of Sciences, 105:18970–18975, 2008.

[7] Surya Ganguli and Haim Sompolinsky. Short-term memory in neuronalnetworks through dynamical compressed sensing. In Advances in neuralinformation processing systems, pages 667–675, 2010.

[8] Lyudmila Grigoryeva, Julie Henriques, Laurent Larger, and Juan-PabloOrtega. Nonlinear memory capacity of parallel time-delay reservoir com-puters in the processing of multidimensional signals. Neural Computa-tion, 28(7):1411–1451, 2016.

[9] H. Jaeger. Short term memory in echo state networks. Technical re-port gmd report 152, German National Research Center for InformationTechnology, 2002.

[10] H. Jaeger and H. Hass. Harnessing nonlinearity: predicting chaotic sys-tems and saving energy in wireless telecommunication. Science, 304:78–80, 2004.

[11] Ata Kaban. New bounds on compressive linear least squares regres-sion. In The 17-th International Conference on Artificial Intelligenceand Statistics (AISTATS 2014), volume 33, page 448456, 2014.

[12] R. Legenstein and W. Maass. What makes a dynamical system com-putationally powerful? In S. Haykin, J. C. Principe, T. Sejnowski, andJ. McWhirter, editors, New Directions in Statistical Signal Processing:From Systems to Brains, pages 127–154. MIT Press, 2007.

[13] Joseph T. Lizier, Mikhail Prokopenko, and Albert Y. Zomaya. Detectingnon-trivial computation in complex dynamics. In Proceedings of the 9thEuropean Conference on Advances in Artificial Life, ECAL’07, pages895–904, Berlin, Heidelberg, 2007. Springer-Verlag.

15

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

[14] Joseph T. Lizier, Mikhail Prokopenko, and Albert Y. Zomaya. Localmeasures of information storage in complex distributed computation.Inf. Sci., 208:39–54, November 2012.

[15] M. Lukosevicius and H. Jaeger. Reservoir computing approaches torecurrent neural network training. Computer Science Review, 3(3):127–149, 2009.

[16] O. Obst and J. Boedecker. Guided self-organization of input-driven re-current neural networks. In In Guided Self-Organization: Inception.Emergence, Complexity and Computation, pages 319–340. Springer,Berlin, Heidelberg, 2014.

[17] Oliver Obst, Joschka Boedecker, and Minoru Asada. Improving recur-rent neural network performance using transfer entropy. In Proceedingsof the 17th International Conference on Neural Information Processing:Models and Applications - Volume Part II, ICONIP’10, pages 193–200,Berlin, Heidelberg, 2010. Springer-Verlag.

[18] M. Rudelson and R. Vershynin. Non-asymptotic theory of random ma-trices: extreme singular values. In In Proceedings of the InternationalCongress of Mathematicians. Volume III, Hindustan Book Agency, NewDelhi, pages 1576–1602, 2010.

[19] T. Tao. Topics in Random Matrix Theory. American MathematicalSociety, Graduate Studies in Mathematics, 2012.

[20] P. Tino and A. Rodan. Short term memory in input-driven linear dy-namical systems. Neurocomputing, 112:58–63, 2013.

[21] M. H. Tong, A.D. Bicket, E.M. Christiansen, and G.W. Cottrell. Learn-ing grammatical structure with echo state network. Neural Networks,20:424–432, 2007.

16

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIP

T

Peter Tino (M.Sc. Slovak University of Technology, Ph.D. Slovak Academyof Sciences) was a Fulbright Fellow with the NEC Research Institute, Prince-ton, NJ, USA, and a Post-Doctoral Fellow with the Austrian Research Insti-tute for AI, Vienna, Austria, and with Aston University, Birmingham, U.K.Since 2003, he has been with the School of Computer Science, Universityof Birmingham, Edgbaston, Birmingham, U.K., where he is currently a fullProfessor - Chair in Complex and Adaptive Systems. His current research in-terests include dynamical systems, machine learning, probabilistic modellingof structured data, evolutionary computation, and fractal analysis. Peter wasa recipient of the Fulbright Fellowship in 1994, the U.K.-Hong-Kong Fellow-ship for Excellence in 2008, three Outstanding Paper of the Year Awardsfrom the IEEE Transactions on Neural Networks in 1998 and 2011 and theIEEE Transactions on Evolutionary Computation in 2010, and the Best Pa-per Award at ICANN 2002. He serves on the editorial boards of severaljournals.

17