integrated vac: a robust strategy for identifying eigenfunctions of … · 2020. 9. 11. · robert...

26
Integrated VAC: A robust strategy for identifying eigenfunctions of dynamical operators Chatipat Lorpaiboon, *,,,Erik Henning Thiede, *,P,§,Robert J. Webber, *,,Jonathan Weare, *, and Aaron R. Dinner *,,Department of Chemistry, University of Chicago, Chicago, IL 60637 James Franck Institute, University of Chicago, Chicago, IL 60637 PFlatiron Institute, New York, NY 60637 §Department of Computer Science, University of Chicago, Chicago, IL 60637 Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 Equal Contributions E-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected] Abstract One approach to analyzing the dynamics of a physical system is to search for long-lived patterns in its motions. This approach has been particularly successful for molecular dy- namics data, where slowly decorrelating pat- terns can indicate large-scale conformational changes. Detecting such patterns is the central objective of the variational approach to confor- mational dynamics (VAC), as well as the re- lated methods of time-lagged independent com- ponent analysis and Markov state modeling. In VAC, the search for slowly decorrelating pat- terns is formalized as a variational problem solved by the eigenfunctions of the system’s transition operator. VAC computes solutions to this variational problem by optimizing a lin- ear or nonlinear model of the eigenfunctions us- ing time series data. Here, we build on VAC’s success by addressing two practical limitations. First, VAC can give poor eigenfunction esti- mates when the lag time parameter is chosen poorly. Second, VAC can overfit when using flexible parameterizations such as artificial neu- ral networks with insufficient regularization. To address these issues, we propose an extension that we call integrated VAC (IVAC). IVAC in- tegrates over multiple lag times before solving the variational problem, making its results more robust and reproducible than VAC’s. Introduction Many physical systems exhibit motion across fast and slow timescales. Whereas individual subcomponents may relax rapidly to a quasi- equilibrium, large collective motions occur over timescales that are orders of magnitude longer. These slow motions are often the most scien- tifically significant. For instance, observing the large-scale conformational changes that govern protein function requires microseconds to days, even though individual atomic vibrations have periods of femtoseconds. However, when ex- ploring new systems, such slow collective pro- cesses may not be fully understood from the outset. Rather, they must be detected from time series data. One approach for automating this process is the “variational approach to conformational dy- namics” (VAC). 1–3 In the VAC framework, slow dynamical processes are identified using func- tions that decorrelate slowly. These functions are the eigenfunctions of a self-adjoint operator 1 arXiv:2007.08027v2 [physics.data-an] 10 Sep 2020

Upload: others

Post on 26-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Integrated VAC: A robust strategy for identifyingeigenfunctions of dynamical operators

    Chatipat Lorpaiboon,∗,†,‡,⊥ Erik Henning Thiede,∗,P,§,⊥ Robert J. Webber,∗,‖,⊥

    Jonathan Weare,∗,‖ and Aaron R. Dinner∗,†,‡

    †Department of Chemistry, University of Chicago, Chicago, IL 60637‡James Franck Institute, University of Chicago, Chicago, IL 60637

    PFlatiron Institute, New York, NY 60637§Department of Computer Science, University of Chicago, Chicago, IL 60637

    ‖Courant Institute of Mathematical Sciences, New York University, New York, NY 10012⊥Equal Contributions

    E-mail: [email protected]; [email protected]; [email protected]; [email protected];[email protected]

    Abstract

    One approach to analyzing the dynamics ofa physical system is to search for long-livedpatterns in its motions. This approach hasbeen particularly successful for molecular dy-namics data, where slowly decorrelating pat-terns can indicate large-scale conformationalchanges. Detecting such patterns is the centralobjective of the variational approach to confor-mational dynamics (VAC), as well as the re-lated methods of time-lagged independent com-ponent analysis and Markov state modeling. InVAC, the search for slowly decorrelating pat-terns is formalized as a variational problemsolved by the eigenfunctions of the system’stransition operator. VAC computes solutionsto this variational problem by optimizing a lin-ear or nonlinear model of the eigenfunctions us-ing time series data. Here, we build on VAC’ssuccess by addressing two practical limitations.First, VAC can give poor eigenfunction esti-mates when the lag time parameter is chosenpoorly. Second, VAC can overfit when usingflexible parameterizations such as artificial neu-ral networks with insufficient regularization. Toaddress these issues, we propose an extensionthat we call integrated VAC (IVAC). IVAC in-

    tegrates over multiple lag times before solvingthe variational problem, making its results morerobust and reproducible than VAC’s.

    Introduction

    Many physical systems exhibit motion acrossfast and slow timescales. Whereas individualsubcomponents may relax rapidly to a quasi-equilibrium, large collective motions occur overtimescales that are orders of magnitude longer.These slow motions are often the most scien-tifically significant. For instance, observing thelarge-scale conformational changes that governprotein function requires microseconds to days,even though individual atomic vibrations haveperiods of femtoseconds. However, when ex-ploring new systems, such slow collective pro-cesses may not be fully understood from theoutset. Rather, they must be detected fromtime series data.

    One approach for automating this process isthe “variational approach to conformational dy-namics” (VAC).1–3 In the VAC framework, slowdynamical processes are identified using func-tions that decorrelate slowly. These functionsare the eigenfunctions of a self-adjoint operator

    1

    arX

    iv:2

    007.

    0802

    7v2

    [ph

    ysic

    s.da

    ta-a

    n] 1

    0 Se

    p 20

    20

  • associated with the system’s dynamics knownas the transition operator. The transition oper-ator evolves expectations of functions over thesystem’s state forward in time and completelydefines the dynamics on a distributional level.VAC estimates the transition operator’s eigen-functions by constructing a linear or nonlinearmodel and using data to optimize parametersin the model. VAC encompasses commonlyused approaches such as time-lagged indepen-dent component analysis3–6 and eigenfunctionestimates constructed using Markov state mod-els.6–9 In addition, recent VAC approaches useartificial neural networks to learn approxima-tions to the eigenfunctions.10,11

    While VAC has been successful in some appli-cations, the approach has limitations. The ac-curacy of the estimated eigenfunctions dependsstrongly on the function space in which theeigenfunctions are approximated, the amount ofdata available, and a hyperparameter known asthe lag time. In our previous work12 we gave acomprehensive error analysis for the linear VACalgorithm. This error analysis showed that thechoice of lag time can be critical to achieving anaccurate VAC scheme. Choosing a lag time thatis too short can cause substantial systematicbias in estimated eigenfunctions, while choos-ing a lag time that is too long can make VACexponentially sensitive to sampling error.

    In this paper, we present an extension of theVAC procedure in which we integrate the cor-relation functions in VAC over a time window.We term this approach integrated VAC (IVAC).Because IVAC is less sensitive to the choice oflag time, it reduces error compared to VAC.Additionally, when IVAC is applied using anapproximation space parameterized by a neuralnetwork, the approach leads to stable trainingand mitigates the overfitting problems associ-ated with VAC.

    We organize the rest of the paper as follows.In the theory section, we review the role of thetransition operator and its eigenfunctions, andwe introduce the VAC approach for estimatingeigenfunctions. We then present the procedurefor IVAC. In the results section, we evaluate theperformance of IVAC on two model systems.We conclude with a summary and a discussion

    of further ways IVAC can be extended.

    Methods

    Background

    In this section, we review the VAC theoreticalframework1,13 that shows how the slowly decor-relating functions in a physical system can beidentified using a linear operator known as thetransition operator.

    We assume that the system of interest is acontinuous-time Markov process Xt ∈ Rn witha stationary, ergodic distribution µ (specificallya Feller process14). We use E to denote expec-tations of the process Xt started from µ. Forexample, if µ is the Boltzmann distribution as-sociated with the Hamiltonian H and tempera-ture T , then expectations of the process satisfy

    E [f (Xt)] =

    ∫f(x)e−H(x)/kBTdx∫e−H(x)/kBTdx

    (1)

    for all t ≥ 0. However, our results are validfor systems with other, more general, stationarydistributions.

    The transition operator

    To begin, we consider the space of real-valued functions with finite second moment(E[f (X0)

    2] < ∞). Equipped with the innerproduct

    〈f, g〉 = E [f (X0) g (X0)] (2)

    this forms a Hilbert space, which we denoteL2µ. We define the transition operator

    14 at alag time τ to be the operator

    Tτf(x) = E [f (Xτ ) |X0 = x] (3)

    applied to a function f ∈ L2µ. Here, we areinterpreting the conditional expectation as afunction of the initial point x.

    The transition operator is also called theMarkov or (stochastic) Koopman operator.6,15

    We use the term transition operator as it is well-established in the literature on stochastic pro-cesses, and the terminology emphasizes the con-

    2

  • nection with finite-state Markov chains. For afinite-state Markov chain, f is a column vectorand Tτ is a row-stochastic transition matrix.

    The transition operator lets us rewrite corre-lation functions in terms of inner products inL2µ:

    E [f (X0) g (Xτ )] = 〈f, Tτg〉 . (4)

    Moreover, we can express the slow motions ofa system’s dynamics in terms of the transitionoperator. The slow motions are identified byfunctions f for which the normalized correlationfunction

    E [f(X0)f(Xτ )]

    E [f(X0)f(X0)]=〈f, Tτf〉〈f, f〉 (5)

    is large. We will show in the next subsectionthat these slowly decorrelating functions lie inthe linear span of the top eigenfunctions of thetransition operator.

    Eigenfunctions of the transition operator

    We can immediately see that Tτ has the con-stant function as an eigenfunction, because

    Tτ1 = E [1|X0 = x] = 1. (6)

    However, there is no guarantee that any othereigenfunctions exist. We must therefore imposeadditional assumptions.

    We first assume that Xt obeys detailed bal-ance. For any functions f, g ∈ L2µ, we have

    E [f (X0) g (Xτ )] = E [f (Xτ ) g (X0)] , (7)

    or equivalently

    〈f, Tτg〉 = 〈Tτf, g〉 . (8)

    This detailed balance condition ensures that Tτis a self-adjoint operator on L2µ.

    Next we assume that Tτ is a compact oper-ator. In our context, assuming compactnessis the same as assuming that the action of Tτcan be decomposed as an infinite sum involving

    eigenfunctions and eigenvalues:

    Tτf(x) =∞∑

    i=1

    e−σiτ 〈ηi, f(x)〉 ηi(x). (9)

    Our assumption of compactness is made for thesake of simplicity; in fact a weaker assumptionof quasi-compactness is sufficient. We refer thereader to Webber et al. 12 for a more generaltreatment.

    At all lag times τ > 0, the function ηi is aneigenfunction of the transition operator T τ witheigenvalue

    λτi = e−σiτ . (10)

    The eigenvalues are indexed so that

    0 = σ1 < σ2 ≤ σ3 ≤ · · · (11)

    and limi→∞ σi =∞. Because the process is er-godic, it is known that the largest eigenvalueλτ1 = 1 is a simple eigenvalue and all othereigenvalues are bounded away from 1. Theparticular dependence of the eigenvalues on τoccurs because the transition operator can bewritten as

    Tτ = eLτ , ∀τ ≥ 0 (12)where L is an operator known as the infinitesi-mal generator.14 We note that it is also commonto consider the implied timescale (ITS) associ-ated with eigenfunction i, defined as

    ITSi = σ−1i . (13)

    We can use the eigenvalues and eigenvectorsof the transition operator to rewrite the normal-ized correlation function (5). Observing thatT0f(x) = f(x) and substituting (9) into thenumerator and denominator of (5) gives

    E [f(X0)f(Xτ )]

    E [f(X0)f(X0)]=

    ∑∞i=1 e

    −σiτ 〈ηi, f〉2∑∞i=1 〈ηi, f〉

    2 . (14)

    We now consider which functions maximizethe normalized correlation function. Apply-ing (11), we find that the normalized correlationfunction is maximized when we set f to be the

    3

  • constant function f(x) = η1(x) = 1, because

    ∑∞i=1 e

    −σiτ 〈ηi, f〉2∑∞i=1 〈ηi, f〉

    2 ≤∑∞

    i=1 e−σ1τ 〈ηi, f〉2∑∞

    i=1 〈ηi, f〉2 (15)

    =e−σ1τ (16)

    for all functions f ∈ L2µ. If we constrain thesearch to functions that are orthogonal to η1,i.e., functions where

    〈η1, f〉 = E [f(x)] = 0 (17)

    and assume σ2 > σ3, the normalized correla-tion function is maximized when f = η2. Ifwe constrain f to be orthogonal to both η1 andη2, then the next slowest decorrelating functionwould be η3, and so forth. Maximizing the nor-malized correlation function at any lag time τis therefore equivalent to identifying the eigen-functions of the transition operator.

    Because of the connection to slowly decor-relating functions, the eigenfunctions providea natural coordinate system for dimensionalityreduction. The first few eigenfunctions providea compact representation of all the slowest mo-tions of the system. Additionally, clusteringdata based on the eigenfunction values makesit possible to identify metastable states.

    The variational approach to conforma-tional dynamics

    The “variational approach to conformationaldynamics” (VAC) is a procedure for identify-ing eigenfunctions by maximizing the normal-ized correlation function. The first eigenfunc-tion, η1 (x) = 1, is known exactly and is set tothe constant function. To identify subsequenteigenfunctions, we parameterize a candidate so-lution f using a vector of parameters θ. Wethen construct an estimate γi for the ith eigen-function by tuning the parameters to maximize(5). We set γi = fθ′ , where

    θ′ = arg maxθ

    E [fθ (X0) fθ (Xτ )]

    E [fθ (X0) fθ (X0)](18)

    subject to 〈fθ, γj〉 = 0 for all j < i. In practice,we use empirical estimates of the correlationsconstructed from sampled data. For instance,

    if our data set consists of a single equilibriumtrajectory x0, x∆, . . . xT−∆, we would then con-struct the estimate

    Ê [f (X0) g (Xτ )] =

    T − τ

    T−∆−τ∆∑

    s=0

    f(xs∆)g(xs∆+τ ) + f(xs∆+τ )g(xs∆)

    2.

    (19)

    Here and in the rest of the paper, we use the ˆsymbol to indicate quantities constructed usingsampled data.

    Once we have obtained an estimated eigen-function γ̂i using data, we can estimate the as-sociated eigenvalue and implied timescale using

    λ̂τi =Ê [γ̂i(X0)γ̂i(Xτ )]

    Ê [γ̂i(X0)γ̂i(X0)](20)

    σ̂i = −1

    τlog λ̂τi . (21)

    If the sampling is perfect, the variational princi-ple ensures that VAC eigenvalues and VAC im-plied timescales are bounded from above by thetrue eigenvalues e−σiτ and implied timescalesσ−1i , and the upper bound is achieved when theVAC eigenfunction is the true eigenfunction ηi.However, since the empirical estimate (20) isused in practice, it is possible to obtain esti-mates that exceed the variational upper bound.

    The earliest VAC approaches estimated theeigenfunctions of the transition operator by us-ing linear combinations of basis functions {φi},a procedure now known as linear VAC. In lin-ear VAC, the optimization parameters are theunknown linear coefficients v, which solve thegeneralized eigenvalue problem

    Ĉ(τ)vi = λ̂τi Ĉ(0)vi, (22)

    where

    Ĉjk(t) = Ê [φj(X0)φk(Xt)] . (23)

    In approaches known as time-lagged indepen-dent component analysis4 and relaxation modeanalysis,13,16 the basis functions {φi} were cho-sen to be the system’s coordinate axes. Thischoice of approximation space is still commonly

    4

  • used to construct collective variable spaces ei-ther for analyzing dynamics or for streamlin-ing further sampling. Markov state models(MSMs) provide an alternative approach forestimating eigenfunctions using linear combi-nations of basis functions.7,9,17,18 MSMs canserve as general dynamical models for the es-timation of metastable structures and chemicalrates.18–21 When MSMs are applied to estimateeigenfunctions and eigenvalues, the approach isequivalent to performing linear VAC using a ba-sis of indicator functions on disjoint sets.6

    Noé and Nuske 1 unified the linear VAC ap-proaches and exploited a general variationalprinciple for identifying eigenvalues and eigen-functions of the transition operator. Subse-quent work further developed the methodol-ogy and introduced more general linear basisfunctions.2,22–24 Moreover, it was observed thatthe general variational principle allows one tomodel the eigenfunctions using nonlinear ap-proximation spaces such as the output of a neu-ral network.10,11 This can lead to very flexibleand powerful approximation spaces. However,in our experience, the greater flexibility can alsolead to overfitting problems that need to be ad-dressed through regularization.

    In a common nonlinear VAC approach, aneural network outputs a set of functionsφ1, φ2, . . . , φS that serve as a basis set for linearVAC calculations. The network parameters arethen optimized to maximize the VAMP score,25

    which under our assumption of detailed balancecan be calculated using

    VAMP-k =S∑

    i=1

    |λ̂τi |k. (24)

    The hyperparameter k is typically set to 1 or2. In this paper, we use the VAMP-1 score,since we find that it leads to more robust train-ing. We note that the score function we useis also called the generalized matrix Rayleighquotient.26

    Challenges in VAC calculations

    A major challenge in VAC calculations is select-ing the lag time τ . Since the early days of VAC,

    it was noted that lag times that are too shortor too long can lead to inaccurate eigenfunctionestimates.27,28 Our recent work12 revealed thatthe sensitivity to lag time is caused by a combi-nation of approximation error at short lag timesand estimation error at long lag times. In thissection, we describe the impact of approxima-tion error and estimation error and provide aschematic (Figure 1) that illustrates the trade-off between approximation error and estimationerror at different lag times.Approximation error is the systematic error of

    VAC that exists even when VAC is performedwith an infinite data set. We expect approxima-tion error to dominate the calculation when thebasis set is of poor quality and our approxima-tion space cannot faithfully represent the eigen-functions of the transition operator. The ap-proximation error is greatest at short lag times,and it decreases and eventually stabilizes as thelag time is increased. Therefore, VAC users cantypically reduce approximation error by avoid-ing the very shortest lag times.Estimation error is the random error of VAC

    that comes from statistical sampling. As shownin our previous work,12 with increasing lag timethe results of VAC become exponentially sensi-tive to small variations in the data set, leadingto high estimation error. At large enough lagtimes, all the eigenfunction estimates γ̂τ2 , γ̂

    τ3 , . . .

    are essentially random noise.

    0.0 0.2 0.4 0.6 0.8 1.0Lag time

    0.0

    0.2

    0.4

    0.6

    VAC

    erro

    r

    Approximation errorEstimation errorTotal error

    Figure 1: Schematic illustrating the sources ofVAC error at different lag times. Even with-out sampling, VAC solutions have approxima-tion error. Random variation due to samplingcontributes additional estimation error.

    5

  • In Webber et al. 12 , we proposed measuringVAC’s sensitivity to estimation error using thecondition number κτ . The condition numbermeasures the largest possible changes that canoccur in the subspace of VAC eigenfunctions{γτj , γ

    τj+1, . . . , γ

    τk

    }when there are small errors

    in the entries of C(0) and C(τ). The conditionnumber is calculated using the expression

    κτ =1

    min{λ̂τj−1 − λ̂τj , λ̂τk − λ̂τk+1

    } . (25)

    For a given problem and a given lag time,we can use the condition number to deter-mine which subspaces of VAC eigenfunctionsare highly sensitive to estimation error andwhich subspaces are comparatively less sensi-tive to estimation error.

    Although we rigorously derived the conditionnumber only in the case of linear VAC, we findthat the condition number is also helpful formeasuring estimation error in nonlinear VAC.If κτ & 5 at all lag times τ , then identifyingeigenfunctions is very difficult and requires alarge data set. We recommend that authorsreport the condition number along with theirVAC results, helping readers to assess whetherthe results are potentially sensitive to estima-tion error.

    Integrated VAC

    To address the difficulty inherent in choosing agood lag time, we propose an extension of VACcalled “integrated VAC” (IVAC) where we inte-grate over a range of different lag times beforesolving a variational problem. We find that thenew approach is more robust to lag time selec-tion and it often gives better results overall.

    Just as VAC maximizes the correlation func-tion in (5), IVAC solves a variational problemby identifying a subspace of functions f thatmaximize the integrated correlation function

    ∫ τmaxτmin

    E [f(X0)f(Xs)]

    E [f(X0)f(X0)]ds . (26)

    As in VAC, the functions solving the variationalproblem are the eigenfunctions of the transition

    operator. When the eigenfunction ηi is substi-tuted into the integrated correlation function(26), the resulting expression is related to theimplied timescales by

    ∫ τmaxτmin

    E [ηi(X0)ηi(Xs)]

    E [ηi(X0)ηi(X0)]ds

    =e−σiτmin − e−σiτmax

    σi. (27)

    Therefore, like VAC, IVAC is a variational ap-proach for identifying both eigenfunctions andimplied timescales.

    IVAC is a natural extension of VAC; in thelimit as τmax approaches τmin, IVAC gives thesame eigenfunction and implied timescale esti-mates as regular VAC. However, when τmax andτmin are separated from each other, the resultsof IVAC and VAC start to diverge. We find thatIVAC with minimal tuning performs compara-bly to VAC with optimal tuning. IVAC has thedesirable feature that it is not very sensitive tothe values of τmin and τmax.

    Previous approaches for estimating eigenfunc-tions using multiple time lags have attemptedto reduce approximation error by accounting forunobserved degrees of freedom.29–32 In contrast,IVAC uses multiple time lags to reduce estima-tion error and improve robustness to parameterchoice.

    Linear IVAC

    Linear IVAC uses linear combinations of ba-sis functions to maximize the integrated auto-correlation function (26). However, as simula-tion data are sampled at discrete time points,we cannot directly calculate the integral. Wetherefore replace (26) with a discrete sum takenover uniformly spaced lag times. We seek tomaximize

    τmax∑

    τ=τmin

    E [f(X0)f(Xτ )]

    E [f(X0)f(X0)], (28)

    where τ = τmin, τmin + ∆, τmin + 2∆, . . . , τmaxand ∆ is the sampling interval. The discretesum (28) approximates (26) up to a constantmultiple, and its value is maximized when flies within the span of the top eigenfunctions

    6

  • of the transition operator. Setting f to be theeigenfunction ηi, we can sum the resulting finitegeometric series:

    τmax∑

    τ=τmin

    E [ηi(X0)ηi(Xτ )]

    E [ηi(X0)ηi(X0)]

    =e−σiτmin − e−σi(τmax+∆)

    1− e−σi∆ . (29)

    In linear IVAC, we optimize linear combina-tions of basis functions {φi} to maximize thefunctional (28). The optimization parametersare the unknown linear coefficients v, whichsolve the generalized eigenvalue problem

    Î(τmin, τmax)vi = λ̂iĈ(0)vi, (30)

    where we have defined

    Ĉjk(t) = Ê [φj(X0)φk(Xt)] (31)

    Î (τmin, τmax) =τmax∑

    τ=τmin

    Ĉ(τ). (32)

    We solve the generalized eigenvalue problem toobtain estimates γ̂i for the transition operator’seigenfunctions. Then, we form the sum

    τmax∑

    τ=τmin

    Ê [γ̂i(X0)γ̂i(Xτ )]

    Ê [γ̂i(X0)γ̂i(X0)], (33)

    and we estimate implied timescales by solving(29) for σ̂i using a root-finding algorithm.

    Nonlinear IVAC

    Nonlinear IVAC maximizes the integrated cor-relation function (26) by constructing approxi-mations in a nonlinear space of functions, forexample, those represented by a neural net-work. Specifically, the nonlinear model pro-vides a set of functions φ1, φ2, . . . , φS that serveas a basis set for linear IVAC. The parametersare trained to maximize the VAMP-k score

    S∑

    i=1

    |λ̂i|k (34)

    where the eigenvalues λ̂i are defined using equa-tion (30). In a linear approximation space, all

    values of VAMP-k scores lead to identical eigen-function estimates. In a nonlinear approxima-tion space, it is theoretically possible that min-imizing with different values of k scores wouldlead to different estimates. However, in prac-tice we find there is little difference between es-timates at the minima. We present our resultsusing k = 1 because it leads to the most stableconvergence; we found that higher values of kare prone to large gradients and, in turn, unsta-ble training. When k = 1, the score functioncan be computed using

    tr(Ĉ(0)−1Î(τmin, τmax)). (35)

    The main practical challenge in an appli-cation of nonlinear IVAC is that the basisfunctions φ1, φ2, . . . , φS change at every iter-ation, requiring costly re-evaluation of Ĉ(0),Î(τmin, τmax), and the gradient of (35) with re-spect to the parameters. To reduce this cost,we have developed the batch subsampling ap-proach described in Algorithm 1, which we ap-ply at the start of each optimization iteration.

    Algorithm 1: subsampling routine

    input : data x0, . . . , xT−∆, τmin, τmax,number of samples N

    for n ∈ 1, 2, . . . , N doSample τn from {τmin, . . . , τmax};Sample sn from {0, . . . , T − τn −∆};

    endoutput: sampled pairs (xsn , xsn+τn)

    In the subsampling approach, we draw a ran-domly chosen set of data points, which allow usto estimate the matrix entries Ĉij(0) using

    N∑

    n=1

    φi(xsn)φj(xsn) + φi(xsn+τ )φj(xsn+τ )

    2N(36)

    and the matrix entries Îij(τmin, τmax) using

    N∑

    n=1

    φi(xsn)φj(xsn+τn) + φi(xsn+τn)φj(xsn)

    2N∆/ (τmax − τmin + ∆).

    (37)After constructing these random matrices, we

    calculate the score function 35. We then use au-

    7

  • tomatic differentiation to obtain the gradient ofthe score function with respect to the parame-ters, and we perform an optimization step. Byrandomly drawing new data points at each op-timization step, we ensure a thorough samplingof the data set and we are able to train thenonlinear representation at reduced cost. Typi-cally, we find that 103–104 data points per batchis enough for the score function (35) to be esti-mated with low bias.

    Results and discussion

    In this section, we provide evidence that IVACis more robust than VAC and can give more ac-curate eigenfunction estimates. First, we showresults from applying IVAC and VAC to thealanine dipeptide. VAC can provide accurateeigenfunction estimates for this test problemowing to the large spectral gap and the approx-imation space that overlaps closely with theeigenfunctions of the transition operator. How-ever, VAC requires a careful tuning of the lagtime. In contrast, IVAC is much less sensitiveto lag time choice. IVAC gives solutions thatare comparable to VAC with the optimal lagtime parameter and substantially better thanVAC with a poorly chosen lag time.

    Second, we show results for the villin head-piece protein. Because the data set has a smallnumber of independent samples and the neu-ral network approximation space is flexible andprone to overfitting, VAC and IVAC suffer fromestimation error at long lag times. Despitethese challenges, we present a robust protocolfor choosing parameters in IVAC to limit the es-timation error, and we show that IVAC is lesssensitive to overfitting for this problem com-pared with VAC.

    Application to the alanine dipep-tide

    In this section we compare linear IVAC andVAC applied to Langevin dynamics simulationsof the alanine dipeptide (i.e., N -acetyl-alanyl-N ′-methylamide) in aqueous solvent; furthersimulation details are given in the supporting

    information.The alanine dipeptide is a well-studied model

    for conformational changes in proteins. Likemany protein systems, the alanine dipeptidehas dynamics that are dominated by transitionsbetween metastable states. The top eigenfunc-tions are useful for locating barriers betweenstates, as these eigenfunctions change sharplywhen passing from one well to another. We fo-cus on estimating η2 and η3, as large changes inthese eigenfunctions correspond to transitionsover the alanine dipeptide’s two largest barri-ers. We refer to the span of η1, η2, and η3 asthe 3D subspace.

    In our experiments, we consider trajectoriesof length 10 ns and 20 ns. The trajectoriesare long enough to observe approximately 15or 30 transitions respectively along the dipep-tide’s slowest degree of freedom. Folding simu-lations of proteins, such as the villin headpiececonsidered below, often have a similar numberof transitions between the folded and unfoldedstates.

    There are several features that make it pos-sible for VAC to perform well on this exam-ple. First, the linear approximation space,which consists of all the dihedral angles in themolecular backbone, is small (just 9 basis func-tions), and it is known to overlap heavily withthe top eigenfunctions of the dynamics. Sec-ond, we are estimating a well-conditioned sub-space with a minimum condition number of just

    minτκτ = min

    τ

    (λ̂τ3 − λ̂τ4

    )−1= 1.4, (38)

    and therefore we do not expect a heavy ampli-fication of sampling error that degrades eigen-function estimates.

    To evaluate the error in our eigenfunction es-timates, we compare to “ground truth” eigen-functions computed using a Markov state modelbuilt with a very long time series (1.5 µs) anda fine discretization of the dihedral angles. Wemeasure error using the projection distance,33

    which evaluates the overlap between one sub-space and the orthogonal complement of an-other subspace. For subspaces U and V withorthonormal basis functions {ui} and {vi}, the

    8

  • projection distance is given by

    d (U ,V) =√∑

    i,j

    (δij − 〈ui, vj〉2

    ). (39)

    This measure, which combines the error in thedifferent eigenfunctions into a single number, isuseful because VAC is typically used to identifysubspaces of eigenfunctions rather than individ-ual eigenfunctions. The maximum possible er-ror when estimating k eigenfunctions is

    √k.

    Our main result from the alanine dipeptideapplication is that IVAC is more robust to theselection of lag time parameters than VAC.In Figure 2, we report the accuracy of IVACand VAC for different lag times and trajectorylengths. In the left column, we show the rootmean square errors (RMSE) for IVAC (orange)and VAC (purple), aggregated over thirty in-dependent trajectories. From the aggregatedresults, IVAC performs nearly as well as VACwith the best possible τ and consistently givesresults much better than VAC with a poorlychosen τ . The RMSE of IVAC is just 0.58 with10 ns trajectories and 0.45 with 20 ns trajec-tories. These low error levels are not far fromthe minimum error of 0.37 that is possible usingour linear approximation space.

    In the right column of Figure 2, we show re-sults for a 10 ns trajectory and a 20 ns trajec-tory. The trajectories were selected to help il-lustrate differences in the error profiles for VACand IVAC; similar plots for all other trajecto-ries can be found in the supporting information.We observe two key differences. First, VAC er-ror can exhibit high-frequency stochastic vari-ability as a function of lag time, a source ofvariability that does not affect integrated VACresults. Second, VAC can have high error levelsat very short and long lag times. The projectiondistance against our reference often reaches 1.0,which might indicate that a true eigenfunctionis completely orthogonal to our estimated sub-space. The error of IVAC is unlikely to reachsuch high extremes.

    We note that the parameter values τmin = 1 psand τmax = 1 ns used in IVAC are not hard totune. The range 1 ps− 1 ns is a broad windowof lag times over which VAC eigenvalues λ̂τ2 and

    0.0

    0.5

    1.0

    1.5

    Erro

    r

    10 nsRMSE

    10 ns1 trajectory

    100 101 102 103Lag Time (ps)

    0.0

    0.5

    1.0

    1.5Er

    ror

    20 nsRMSE

    100 101 102 103Lag Time (ps)

    20 ns1 trajectory

    VAC IVAC

    Figure 2: Linear IVAC and VAC errors for ala-nine dipeptide trajectories. IVAC was appliedwith τmin = 1 ps and τmax = 1 ns. VAC was ap-plied with variable lag time τ (horizontal axis).Errors are computed using the projection dis-tance to the MSM reference for the span of η2and η3. (left) Root mean square errors (RMSE)over 30 independent trajectories. (right) Errorsfor a single trajectory.

    9

  • 2 0 2

    2

    02

    Dihe

    dral

    Ang

    le =1ps

    2 0 2

    VAC=25ps

    2 0 2Dihedral Angle

    =(1-1000)ps

    2 0 2

    IVAC

    2 0 2

    MSMRef=250ps

    Figure 3: Clusters on the eigenfunctions estimated using VAC and IVAC compared with clusterson an accurate MSM. (left of the dashed line) VAC and IVAC results for the 20 ns trajectory fromFigure 2. (right of the dashed line) Clustering on η2 and η3 evaluated using an accurate MSMreference.

    100 101 102 103Lag Time (ps)

    100

    101

    102

    103

    Lag

    Tim

    e (p

    s)

    100 101 102 103Lag Time (ps)

    0.0

    0.5

    1.0

    Eige

    nval

    ues 1

    23

    456

    100 101 102 103min (ps)

    100

    101

    102

    103

    max

    (ps)

    0.0

    0.5

    1.0

    1.5 Projection Distanceb/w VAC Subspaces 0.0

    0.5

    1.0

    1.5

    Error

    Figure 4: Lag time dependence of VAC and IVAC results. All results shown are for the single20 ns alanine dipeptide trajectory in Figure 2. (left) Projection distance between VAC results at thehorizontal axis lag time and VAC results at the vertical axis lag time. (center) First six estimatedeigenvalues of the transition operator. (right) Error in IVAC results at different values of τmin andτmax, evaluated using the projection distance.

    10

  • λ̂τ3 decrease from values near one to values nearzero. In contrast, it is much harder to tune theVAC lag time τ . VAC results are very sensitiveto high or low lag times as seen in Figure 2.

    When eigenfunction estimates are accurate,we expect that the eigenfunction coordinateswill help identify the system’s metastablestates. In Figure 3, we compare the resultsof clustering configurations in the 20 ns alaninedipeptide trajectory in Figure 2 using the asso-ciated IVAC and VAC estimates. We plot thepredicted metastable states against the dipep-tide’s φ and ψ dihedral angles. In the figure, wepresent VAC results taken at a short lag time,an intermediate lag time, and a long lag time.We also present results for the MSM reference.Comparing against the reference, we find thatIVAC identifies clusters as accurately as VACat a well-chosen lag time, and IVAC performsfar better than VAC at a poorly-chosen lagtime.

    Next, we present additional analyses appliedto a single 20 ns alanine dipeptide trajectory,that provide insight into why IVAC is more ro-bust to lag time selection than VAC. To start,we examine the discrepancy in VAC results atdifferent lag times. In Figure 4, left, we per-formed VAC with a range of different lag times,and we measured the projection distance be-tween the VAC results obtained at one lag timeτ1 (horizontal axis) and the VAC results ob-tained at a different lag time τ2 (vertical axis).The square with low projection distance be-tween 3 ps and 200 ps indicates that VAC re-sults with lag times chosen within this range aresimilar to one another, but not to those with lagtimes taken from outside this range.

    The discrepancy between VAC results at bothlow and high lag times can be explained by aplot of VAC eigenvalues (Figure 4, center). At3 ps, there is an eigenvalue crossing betweenthe eigenvalues λ̂τ3 and λ̂

    τ4 (shown in purple

    and magenta). The eigenvalue crossing causesVAC to misidentify the third VAC eigenfunc-tion (which is inside the 3D subspace) and thefourth VAC eigenfunction (which is outside the3D subspace). At 200 ps, there is a differentproblem related to insufficient sampling. Thethird eigenvalue descends into noise, causing

    VAC to fit the first two eigenfunctions at theexpense of the 3D subspace.

    With integrated VAC, the problem of find-ing a single good lag time is replaced with theproblem of finding two endpoints for a range oflag times. This proves to be an easier task asIVAC is more tolerant of lag times outside theregion where VAC gives good results. In Figure4, right, we show the error of IVAC as a functionof τmin and τmax (horizontal and vertical axes,respectively). This figure, which shows the er-ror of IVAC estimates computed from compari-son with the reference, is different from the fig-ure on the left which shows only the discrepancybetween VAC results at different lag times. Fig-ure 4, right, also shows the error of VAC, whichappears along the diagonal of the plot corre-sponding to the case τmin = τmax.

    Figure 4, right, reveals that the range of lagtime parameters for which IVAC exhibits lowerror levels is much broader than the rangeof lag times for which VAC exhibits low er-ror levels. This supports our basic argumentthat choosing good parameters in IVAC is eas-ier than choosing good parameters in VAC. Toachieve low errors, we do not need to identifythe optimal VAC lag times but only integrateover a window that contains the optimal VAClag times while ensuring that τmax is not exces-sively high.

    Application to the villin headpiece

    Next we apply IVAC to a difficult spectral esti-mation problem with limited data. We seek toestimate the slow dynamics for an engineered35-residue subdomain of the villin headpieceprotein. Our data consist of a 125 µs molecu-lar dynamics simulation performed by Lindorff-Larsen et al. 34 Villin is a common model sys-tem for protein folding for both experimentaland computational studies,34–37 where the topeigenfunctions correlate with the folding andunfolding of the protein.

    On the surface, the villin data set would seemto be much larger and more useful for spectralestimation compared to the 10 − 20 ns trajec-tories we examined for the alanine dipeptide.

    11

  • However, the villin headpiece relaxes to equilib-rium orders of magnitude more slowly than thealanine dipeptide. The data set contains just 34folding/unfolding events with a folding time of2.8 µs. The limited number of observed eventsis characteristic of simulations of larger andmore complex biomolecules, since simulationsrequire massive computational resources andconformational changes take place slowly overmany molecular dynamics time steps. Thefact that dynamics of villin are not under-stood nearly as well as the dynamics of the ala-nine dipeptide presents an additional challenge.Compared to the alanine dipeptide, villin has amore complex free energy surface and a largernumber of degrees of freedom. Since the trueeigenfunctions of the system are unknown, itis appropriate to apply spectral estimation us-ing a large and diverse feature set. However,the large size and diversity of the feature setincreases the risk of estimation error.

    In contrast to the alanine dipeptide results,where we applied IVAC using linear combina-tions of basis functions, here we apply IVACusing a neural network. The increased flexibil-ity of the neural network approximation reducesapproximation error. However, the procedurefor optimizing the neural network is more com-plicated than the procedure for applying linearVAC. Moreover, the complexity of the neuralnetwork representation (around 5× 104 param-eters) makes overfitting a concern for this ex-ample.

    We use a slight modification of the neural net-work architecture published in Sidky et al. 38 ,with 2 hidden layers of 50 neurons, tanh nonlin-earities, and batch normalization between lay-ers. The network is built on top of a richset of features, consisting of all the Cα pair-wise distances as well as sines and cosines ofall dihedral angles. At each optimization step,we subsample 104 data points using Algorithm1. We optimize the neural network parametersusing AdamW39 with a learning rate of 10−4

    and a weight decay coefficient of 10−2. Follow-ing standard practice, we use the first half ofthe data set for training and the second halffor validation. We validate the neural networkagainst the testing data set every 100 optimiza-

    tion steps, and perform early stopping with apatience of 10.

    We present our results for villin in two parts.First we describe our procedure for selecting pa-rameters in nonlinear IVAC. Next we highlightevidence that nonlinear IVAC shows greater ro-bustness to overfitting compared to nonlinearVAC.

    Selection of parameters

    Here, we describe the protocols we use for se-lecting IVAC parameters. By establishingclear protocols, we help ensure that IVAC per-forms to the best of its ability, providing ro-bust eigenfunction estimates even in a high-dimensional setting with limited data.

    Our first protocol is to evaluate the condi-tion number for the subspace of eigenfunctionsthat we are estimating. This protocol is moti-vated by the theoretical error analysis in Web-ber et al. 12 , where we showed that spectral es-timates are less sensitive to estimation error fora well-conditioned subspace. To ensure that weare estimating a well-conditioned subspace, wefirst use IVAC to estimate eigenvalues for thetransition operator. We then identify a sub-space of eigenfunctions η1, η2, . . . , ηk that is sep-arated from all other eigenfunctions by a largespectral gap λ̂τk − λ̂τk+1.

    For the villin data, we choose the subspaceconsisting only of the constant eigenfunctionη1 = 1 and the first nontrivial eigenfunctionη2. This is a well-conditioned subspace with aminimum condition number

    minτκτ = min

    τ

    (λ̂τ2 − λ̂τ3

    )−1= 1.6. (40)

    Our second protocol for ensuring robustnessis to check that eigenfunction estimates remainconsistent when the random seeds used in ini-tialization and subsampling are changed. Wetrain ten nonlinear IVAC neural networks andquantify the inconsistency in the results usingthe root mean square projection distance be-tween eigenspace estimates from different runs.The results of this calculation are plotted inFigure 5 across a range of τmin and τmax values.The results for VAC appear along the diagonal

    12

  • 100 101 102 103 104Lag Time (ns)

    0.00.20.40.60.81.0

    Eige

    nval

    ues

    1 3 10 30 100

    300

    1000

    3000

    1000

    0

    min (ns)

    13

    1030

    100300

    10003000

    10000

    max

    (ns)

    0.00.20.40.60.81.0 Projection Distance

    b/w Replicates

    Figure 5: Nonlinear IVAC results for the 125 µs villin headpiece trajectory. (left) Estimatedeigenvalues of the transition operator. (right) Root mean square projection distance between 10replicates of nonlinear IVAC at the specified values of τmin and τmax.

    tICA

    2IVAC1 3 ns

    tICA 1

    tICA

    2IVAC1 100 ns

    tICA 1 tICA 1

    Figure 6: Nonlinear IVAC results plotted on the first two time-lagged independent componentanalysis (tICA) coordinates. (top) IVAC with a 1 − 3 ns integration window and three differentrandom seeds. (bottom) IVAC with a 1 − 100 ns integration window and three different randomseeds.

    13

  • of the plot in Figure 5, corresponding to thecase τmin = τmax.

    Figure 5 reveals problems with consistency forboth IVAC and VAC, . IVAC is robust to thechoice of τmin. However, setting τmax < 30 ns orτmax > 300 ns leads to poor consistency. If wetrain the neural network with these problem-atic τmax values, then solutions can look verydifferent depending on the random seeds thatare used for optimizing. With VAC, settingτ < 10 ns or τ > 300 ns would lead to in-consistent results.

    IVAC provides more flexibility to address theconsistency issues compared to VAC, since wecan integrate over a range of lag times. Forthe villin data, we choose to set τmin = 1 nsand τmax = 100 ns. For these parameter values,the consistency score is very good. The typi-cal projection distance between subspaces withdifferent random seeds is just 0.05. Moreover,1− 100 ns is a wide range of lag times, helpingto ensure that optimal or near-optimal VAC lagtimes are included in the integration window.

    To help explain why the consistency is so poorfor small τmax values, we present in Figure 6 aset of IVAC solutions obtained with an inte-gration window of 1 − 3 ns and three differentrandom seeds. We see that all three solutionsidentify clusters in the data, but the clustersare completely different in the three cases. Weconjecture that IVAC is randomly fitting threedifferent eigenspaces. This is supported by theeigenvalue plot in Figure 5, which shows thatthree nontrivial eigenvalues of the transition op-erator lie close together over the 1 − 3 ns timewindow, making it possible that eigenspaces arerandomly misidentified by IVAC.

    In contrast to the inconsistent results ob-tained with an integration window of 1− 3 ns,we obtain more reasonable results with an in-tegration window of 1 − 100 ns. As shown inFigure 6, the IVAC solutions are nearly identi-cal regardless of the random seed.

    In summary, we have proposed a robust pro-cedure for approximating eigenfunctions of thevillin headpiece system. We have chosen toapproximate a well-conditioned eigenspace thatis separated from other eigenspaces by a widespectral gap. Moreover, we have ensured that

    IVAC results are consistent regardless of therandom initialization and random drawn sub-sets used to train the neural net. Because ofthese protocols, the neural net estimates shownin Figure 6 reliably identify clusters in thetrajectory data indicative of folded/unfoldedstates.

    Robustness to overfitting

    In this section, we present results suggestingthat nonlinear IVAC is more robust to overfit-ting than nonlinear VAC. This is crucial if thedata set is too small for cross-validation.

    To identify the overfitting issue with smalldata sets, we eliminate the early stopping andwe train IVAC and VAC until the training lossstabilizes. We calculate implied timescales byperforming linear VAC on the outputs of thenetworks trained using IVAC and VAC, whichwe present in Figure 7.

    100 101 102 103 104Lag Time (ns)

    103

    104

    ITS

    (ns)

    Train

    100 101 102 103 104Lag Time (ns)

    Test

    0.00 0.05 0.10Frequency (ns 1)

    10 3

    10 2

    10 1

    100

    PSD

    (ns)

    Train

    0.00 0.05 0.10Frequency (ns 1)

    Test

    VAC (100 ns) IVAC (1 100 ns)

    Figure 7: Implied time scales (ITS) and powerspectral densities (PSD) obtained with nonlin-ear IVAC and VAC with neural network basisfunctions applied to the villin headpiece dataset. The VAC training lag time is marked bythe dotted line in each panel.

    We first compare the estimated impliedtimescales between the training and valida-

    14

  • tion data sets. For both algorithms, the im-plied timescales calculated on the training dataare larger than those calculated on the valida-tion data. This is clear evidence of overfitting.However, we see that IVAC gives larger impliedtimescales on the validation data compared toVAC. In combination with the variational prin-ciple associated with the implied timescales,this suggests that IVAC is giving an improvedestimate for the slow eigenfunctions.

    Examining the implied timescales estimatedon training data show further signs of overfit-ting. The VAC implied timescale estimates forthe training data exhibit sharp peaks at thetraining lag time that are absent in the im-plied timescale estimates of the validation data.This suggests a hypothesis for the mechanism ofoverfitting: with a sufficiently flexible approxi-mation space, VAC is able to find spurious cor-relations between features that happen to beseparated by τ . This explains the smaller peaksat integral multiples of the lag time, as featuresartificially correlated at τ will be correlated at2τ as well.

    To confirm our hypothesis, we we plot thepower spectral density (PSD)40 of the timetrace of eigenfunction estimates in Figure 7.The PSD confirms the existence of a periodiccomponent in VAC results with a frequency atthe inverse training lag time. In contrast, IVACdoes not exhibit such a periodic component.In Figure 7, we see that the 1 − 100 ns inte-gration window leads to implied timescale esti-mates that depend smoothly on the data bothfor the training and the test data set. The PSDshows no periodic components in the spectra forIVAC, providing further evidence that IVAC iscomparatively robust while VAC results can bevery sensitive to the particular lag time that isused.

    Conclusion

    In this paper we have presented integratedVAC (IVAC), a new extension to the popularvariational approach to conformational dynam-ics (VAC). By integrating correlation functionsover a window of lag times, IVAC provides ro-

    bust estimates of the eigenfunctions of a sys-tem’s transition operator.

    To test the efficacy of the new approach, wecompared IVAC and VAC results on two molec-ular systems. First, we applied the spectralestimation methods to simulation data fromthe alanine dipeptide. This is a relatively sim-ple system that permits generation of extensivereference data for validating our calculations.As we varied the lag time parameters and theamount of data available, we observed the im-proved robustness of IVAC compared to VAC.IVAC gives low-error eigenfunction estimateseven when the lag times range over multipleorders of magnitude. In contrast, VAC requiresmore precise lag-time tuning to give reasonableresults

    Next we applied IVAC to analyze a fold-ing/unfolding trajectory for the villin head-piece. These data contain relatively few fold-ing/unfolding events despite pushing the lim-its of present computing technology. For thisapplication, we used a flexible neural networkrepresentation built on top of a rich feature set.We presented a procedure for selecting param-eters in IVAC that helps lead to robust per-formance in the face of uncertainty. For theapplication to villin data, we found that VACexhibited pronounced artifacts from overfittingwhen precautions were not taken to specificallyprevent it, while IVAC did not.

    Our work highlights the sensitivity of VACcalculations to error from insufficient sampling.Examining our results on the villin headpiece,we see that regularization (here, by early stop-ping) and validation are crucial when runningVAC with neural networks or other flexible ap-proximation spaces. With insufficient regular-ization or poor validation these schemes easilyoverfit. Even for the alanine dipeptide exam-ple, where we employ a simple basis on a sta-tistically well-conditioned problem, we see thatVAC has a high probability of giving spuriousresults with insufficient data.

    Integrated VAC addresses this problem byconsidering information across multiple timelags. Future extensions of the work couldfurther leverage this information. For in-stance, employing a well-chosen weighting func-

    15

  • tion within the integral in (5) could furtherdecrease hyperparameter sensitivity. Addi-tionally, future numerical experiments couldpoint to improved procedures for selectingτmin and τmax values. Finally, we could in-tegrate over multiple lag times in other for-malisms using the transition operator, such asschemes that estimate committors and mean-first-passage times.32 These extensions wouldfurther strengthen the basic message of ourwork: combining information from multiple lagtimes leads to improved estimates of the tran-sition operator and its properties.

    Acknowledgement EHT was supported byDARPA grant HR00111890038. RJW was sup-ported by the National Science Foundationthrough award DMS-1646339. CL, ARD, andJW were supported by the National Institutesof Health award R35 GM136381. JW was sup-ported by the Advanced Scientific ComputingResearch Program within the DOE Office ofScience through award DE-SC0020427. Thevillin headpiece data set was provided by D.E.Shaw Research. Computing resources whereprovided by the University of Chicago ResearchComputing Center.

    Supporting Information Avail-

    able

    The following files are available free of charge.Alanine dipeptide simulation details and error

    plots for additional individual alanine dipeptidetrajectories. Loss functions for villin nonlinearVAC and IVAC training.

    References

    (1) Noé, F.; Nuske, F. A variational approachto modeling slow processes in stochasticdynamical systems. Multiscale Modeling &Simulation 2013, 11, 635–655.

    (2) Nske, F.; Keller, B. G.; Pérez-Hernández, G.; Mey, A. S.; No, F.Variational approach to molecular ki-

    netics. Journal of Chemical Theory andComputation 2014, 10, 1739–1752.

    (3) Pérez-Hernández, G.; Paul, F.;Giorgino, T.; De Fabritiis, G.; Noé, F.Identification of slow molecular order pa-rameters for Markov model construction.The Journal of Chemical Physics 2013,139, 07B604 1.

    (4) Molgedey, L.; Schuster, H. G. Separationof a mixture of independent signals usingtime delayed correlations. Physical ReviewLetters 1994, 72, 3634.

    (5) Schwantes, C. R.; Pande, V. S. Improve-ments in Markov state model construc-tion reveal many non-native interactionsin the folding of NTL9. Journal of Chem-ical Theory and Computation 2013, 9,2000–2009.

    (6) Klus, S.; Nüske, F.; Koltai, P.; Wu, H.;Kevrekidis, I.; Schütte, C.; Noé, F. Data-driven model reduction and transfer oper-ator approximation. Journal of NonlinearScience 2018, 28, 985–1010.

    (7) Schütte, C.; Fischer, A.; Huisinga, W.;Deuflhard, P. A direct approach to confor-mational dynamics based on hybrid MonteCarlo. Journal of Computational Physics1999, 151, 146–168.

    (8) Swope, W. C.; Pitera, J. W.; Suits, F. De-scribing protein folding kinetics by molec-ular dynamics simulations. 1. Theory. TheJournal of Physical Chemistry B 2004,108, 6571–6581.

    (9) Swope, W. C.; Pitera, J. W.; Suits, F.;Pitman, M.; Eleftheriou, M.; Fitch, B. G.;Germain, R. S.; Rayshubski, A.;Ward, T. C.; Zhestkov, Y. et al. De-scribing protein folding kinetics bymolecular dynamics simulations. 2. Ex-ample applications to alanine dipeptideand a β-hairpin peptide. The Journalof Physical Chemistry B 2004, 108,6582–6594.

    16

  • (10) Mardt, A.; Pasquali, L.; Wu, H.; Noé, F.VAMPnets for deep learning of molecularkinetics. Nature Communications 2018,9, 5.

    (11) Chen, W.; Sidky, H.; Ferguson, A. L. Non-linear discovery of slow molecular modesusing state-free reversible VAMPnets. TheJournal of Chemical Physics 2019, 150,214114.

    (12) Webber, R. J.; Thiede, E. H.; Dow, D.;Dinner, A. R.; Weare, J. Error sources inspectral estimation for Markov processes.arXiv preprint arXiv:TBD 2020,

    (13) Takano, H.; Miyashita, S. Relaxationmodes in random spin systems. Journal ofthe Physical Society of Japan 1995, 64,3688–3698.

    (14) Kallenberg, O. Foundations of modernprobability ; Springer Science & BusinessMedia, 2006.

    (15) Eisner, T.; Farkas, B.; Haase, M.;Nagel, R. Operator Theoretic Aspects ofErgodic Theory ; Springer, 2015; Vol. 272.

    (16) Hirao, H.; Koseki, S.; Takano, H. Molec-ular dynamics study of relaxation modesof a single polymer chain. Journal of thePhysical Society of Japan 1997, 66, 3399–3405.

    (17) Swope, W. C.; Pitera, J. W.; Suits, F. De-scribing protein folding kinetics by molec-ular dynamics simulations. 1. Theory. TheJournal of Physical Chemistry B 2004,108, 6571–6581.

    (18) Pande, V. S.; Beauchamp, K.; Bow-man, G. R. Everything you wanted toknow about Markov State Models butwere afraid to ask. Methods 2010, 52, 99–105.

    (19) Vanden-Eijnden, E. An Introduction toMarkov State Models and Their Applica-tion to Long Timescale Molecular Simula-tion; Springer, 2014; pp 91–100.

    (20) Noé, F.; Prinz, J.-H. In An Introduction toMarkov State Models and Their Applica-tion to Long Timescale Molecular Simula-tion, vol. 797 of Advances in Experimen-tal Medicine and Biology ; Bowman, G. R.,Pande, V. S., Noé, F., Eds.; Springer,2014; Chapter 6.

    (21) Keller, B. G.; Aleksic, S.; Donati, L. InBiomolecular Simulations in Drug Discov-ery ; Gervasio, F. L., Spiwok, V., Eds.;Wiley-VCH, 2019; Chapter 4.

    (22) Vitalini, F.; Noé, F.; Keller, B. A ba-sis set for peptides for the variational ap-proach to conformational kinetics. Jour-nal of Chemical Theory and Computation2015, 11, 3992–4004.

    (23) Boninsegna, L.; Gobbo, G.; Noé, F.;Clementi, C. Investigating molecular ki-netics by variationally optimized diffusionmaps. Journal of Chemical Theory andComputation 2015, 11, 5947–5960.

    (24) Schwantes, C. R.; McGibbon, R. T.;Pande, V. S. Perspective: Markov modelsfor long-timescale biomolecular dynamics.The Journal of Chemical Physics 2014,141, 09B201.

    (25) Wu, H.; Nüske, F.; Paul, F.; Klus, S.;Koltai, P.; Noé, F. Variational Koop-man models: slow collective variablesand molecular kinetics from short off-equilibrium simulations. The Journal ofChemical Physics 2017, 146, 154104.

    (26) McGibbon, R. T.; Pande, V. S. Varia-tional cross-validation of slow dynamicalmodes in molecular kinetics. The Journalof Chemical Physics 2015, 142, 124105.

    (27) Naritomi, Y.; Fuchigami, S. Slow dynam-ics in protein fluctuations revealed bytime-structure based independent compo-nent analysis: the case of domain motions.The Journal of Chemical Physics 2011,134, 02B617.

    (28) Husic, B. E.; Pande, V. S. Note: MSMlag time cannot be used for variational

    17

  • model selection. The Journal of ChemicalPhysics 2017, 147, 176101.

    (29) Wu, H.; Prinz, J.-H.; Noé, F. Projectedmetastable Markov processes and their es-timation with observable operator mod-els. The Journal of chemical physics 2015,143, 10B610 1.

    (30) Suárez, E.; Adelman, J. L.; Zucker-man, D. M. Accurate estimation of pro-tein folding and unfolding times: beyondMarkov state models. Journal of chemicaltheory and computation 2016, 12, 3473–3481.

    (31) Cao, S.; Montoya-Castillo, A.; Wang, W.;Markland, T. E.; Huang, X. On the ad-vantages of exploiting memory in Markovstate models for biomolecular dynamics.The Journal of Chemical Physics 2020,153, 014105.

    (32) Thiede, E. H.; Giannakis, D.; Din-ner, A. R.; Weare, J. Galerkin approxi-mation of dynamical quantities using tra-jectory data. The Journal of ChemicalPhysics 2019, 150, 244111.

    (33) Edelman, A.; Arias, T. A.; Smith, S. T.The geometry of algorithms with orthogo-nality constraints. SIAM Journal on Ma-trix Analysis and Applications 1998, 20,303–353.

    (34) Lindorff-Larsen, K.; Piana, S.;Dror, R. O.; Shaw, D. E. How fast-folding proteins fold. Science 2011, 334,517–520.

    (35) McKnight, J. C.; Doering, D. S.; Matsu-daira, P. T.; Kim, P. S. A thermostable35-residue subdomain within villin head-piece. 1996.

    (36) Kubelka, J.; Eaton, W. A.; Hofrichter, J.Experimental tests of villin subdomainfolding simulations. Journal of molecularbiology 2003, 329, 625–630.

    (37) Duan, Y.; Kollman, P. A. Pathways to aprotein folding intermediate observed in a

    1-microsecond simulation in aqueous solu-tion. Science 1998, 282, 740–744.

    (38) Sidky, H.; Chen, W.; Ferguson, A. L.High-Resolution Markov State Modelsfor the Dynamics of Trp-Cage Minipro-tein Constructed Over Slow FoldingModes Identified by State-Free ReversibleVAMPnets. The Journal of PhysicalChemistry B 2019, 123, 7999–8009,PMID: 31453697.

    (39) Loshchilov, I.; Hutter, F. DecoupledWeight Decay Regularization. 2017.

    (40) Welch, P. The use of fast Fourier trans-form for the estimation of power spec-tra: A method based on time averagingover short, modified periodograms. IEEETransactions on Audio and Electroacous-tics 1967, 15, 70–73.

    18

  • Graphical TOC Entry

    100 101 102 103Lag Time

    0.50

    0.75

    1.00

    1.25

    Erro

    r

    VAC withdifferentlag times

    IVAC overthe window100 103 ns

    19

  • Supporting Information for

    Integrated VAC: A robust strategy for identifying

    eigenfunctions of dynamical operators

    Chatipat Lorpaiboon,∗,†,‡,⊥ Erik Henning Thiede,∗,P,§,⊥ Robert J. Webber,∗,‖,⊥

    Jonathan Weare,∗,‖ and Aaron R. Dinner∗,†,‡

    †Department of Chemistry, University of Chicago, Chicago, IL 60637

    ‡James Franck Institute, University of Chicago, Chicago, IL 60637

    PFlatiron Institute, New York, NY 60637

    §Department of Computer Science, University of Chicago, Chicago, IL 60637

    ‖Courant Institute of Mathematical Sciences, New York University, New York, NY 10012

    ⊥Equal Contributions

    E-mail: [email protected]; [email protected]; [email protected]; [email protected];

    [email protected]

    1

    arX

    iv:2

    007.

    0802

    7v2

    [ph

    ysic

    s.da

    ta-a

    n] 1

    0 Se

    p 20

    20

  • Simulation details for the alanine dipeptide experiments

    All simulations were conducted using Gromacs 5.1.4.1–6 The molecule was represented by the

    CHARMM 27 force field7 in a solvent modelled by 513 water molecules using a rigid TIP3P

    model.8 Long-range electrostatics were performed using particle-mesh Ewald summation at

    fourth order with a Fourier spacing of 0.12 nm.9 Each simulation used Langevin dynamics,

    integrated using the GROMACS leap-frog Langevin integrator with a 1 fs time step and a

    time constant of 0.1 ps at a temperature of 310 K. Hydrogen bonds were constrained to be

    rigid using LINCS,10 and water rigidity was enforced using SETTLE.11 In each simulation,

    the system was initialized at a density of 1 kg / L. The system was then equilibrated for

    50 ps at constant volume, followed by another 50 ps equilibration at constant pressure using

    the Parrinello-Rahman barostat.12 Finally, the system was again equilibrated at constant

    volume for 50 ns. The data set used was obtained from a production run of 50 ns, with

    structures saved every 500 fs. To construct our references for the true eigenfunctions, we ran

    10 simulations each of length 150 ns, and constructed an MSM on all dihedral angles. This

    MSM had 500 Markov states; these were identified by k-means clustering, and we estimated

    the eigenfunctions and eigenvalues using pyEMMA.13

    References

    (1) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E.

    GROMACS: High performance molecular simulations through multi-level parallelism

    from laptops to supercomputers. SoftwareX 2015, 1, 19–25.

    (2) Páll, S.; Abraham, M. J.; Kutzner, C.; Hess, B.; Lindahl, E. Tackling exascale software

    challenges in molecular dynamics simulations with GROMACS. International confer-

    ence on exascale applications and software. 2014; pp 3–27.

    (3) Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.;

    S2

  • Smith, J. C.; Kasson, P. M.; van der Spoel, D. et al. GROMACS 4.5: a high-throughput

    and highly parallel open source molecular simulation toolkit. Bioinformatics 2013, 29,

    845–854.

    (4) Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J.

    GROMACS: fast, flexible, and free. Journal of Computational Chemistry 2005, 26,

    1701–1718.

    (5) Lindahl, E.; Hess, B.; Van Der Spoel, D. GROMACS 3.0: a package for molecular

    simulation and trajectory analysis. Molecular Modeling nnual 2001, 7, 306–317.

    (6) Berendsen, H. J.; van der Spoel, D.; van Drunen, R. GROMACS: a message-passing

    parallel molecular dynamics implementation. Computer physics communications 1995,

    91, 43–56.

    (7) MacKerell Jr, A. D.; Banavali, N.; Foloppe, N. Development and current status of the

    CHARMM force field for nucleic acids. Biopolymers: Original Research on Biomolecules

    2000, 56, 257–265.

    (8) Berendsen, H. J.; Postma, J. P.; van Gunsteren, W. F.; Hermans, J. Intermolecular

    forces ; Springer, 1981; pp 331–342.

    (9) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A

    smooth particle mesh Ewald method. The Journal of Chemical Physics 1995, 103,

    8577–8593.

    (10) Hess, B.; Bekker, H.; Berendsen, H. J.; Fraaije, J. G. LINCS: a linear constraint solver

    for molecular simulations. Journal of computational chemistry 1997, 18, 1463–1472.

    (11) Miyamoto, S.; Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE

    algorithm for rigid water models. Journal of computational chemistry 1992, 13, 952–

    962.

    S3

  • (12) Parrinello, M.; Rahman, A. Polymorphic transitions in single crystals: A new molecular

    dynamics method. Journal of Applied physics 1981, 52, 7182–7190.

    (13) Scherer, M. K.; Trendelkamp-Schroer, B.; Paul, F.; Prez-Hernndez, G.; Hoffmann, M.;

    Plattner, N.; Wehmeyer, C.; Prinz, J.-H.; No, F. PyEMMA 2: a software package for

    estimation, validation, and analysis of Markov models. Journal of Chemical Theory and

    Computation 2015, 11, 5525–5542.

    S4

  • 0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    100 101 102 103Lag Time (ps)

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    100 101 102 103Lag Time (ps)

    100 101 102 103Lag Time (ps)

    100 101 102 103Lag Time (ps)

    100 101 102 103Lag Time (ps)

    VAC IVAC

    Figure S1: VAC (at the horizontal axis lag time) and IVAC (with τmin = 1 ps and τmax = 1 ns)errors for all 30 of the 10-ns long alanine dipeptide trajectories.

    S5

  • 0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    100 101 102 103Lag Time (ps)

    0.0

    0.5

    1.0

    1.5

    3D S

    ubsp

    ace

    Erro

    r

    100 101 102 103Lag Time (ps)

    100 101 102 103Lag Time (ps)

    100 101 102 103Lag Time (ps)

    100 101 102 103Lag Time (ps)

    VAC IVAC

    Figure S2: VAC (at the horizontal axis lag time) and IVAC (with τmin = 1 ps and τmax = 1 ns)errors for all 30 of the 20-ns long alanine dipeptide trajectories.

    S6

  • 5.85

    5.90

    5.95

    6.00

    VAM

    P-1

    Scor

    e

    IVAC (1 3 ns)

    185

    190

    195

    200VA

    MP-

    1 Sc

    ore

    IVAC (1 100 ns)

    1.8

    1.9

    2.0

    VAM

    P-1

    Scor

    e

    VAC (100 ns)

    4000

    4500

    5000

    5500

    VAM

    P-1

    Scor

    e

    IVAC (1 3000 ns)

    0 5000 10000Optimization Steps

    1.2

    1.4

    1.6

    VAM

    P-1

    Scor

    e

    VAC (3000 ns)

    0 5000 10000Optimization Steps

    0 5000 10000Optimization Steps

    0 5000 10000Optimization Steps

    Train Loss Test Loss

    Figure S3: VAMP-1 scores as a function of optimization steps during the training of theneural networks on the villin headpiece data set. Early-stopping times marked by the blackdotted lines.

    S7