deeplearning enhanced markov state models (msms)chz379.ust.hk/songshanhu/deeplearning_msm.pdf · 2...

Feb 20, 2019

Deep learning enhanced Markov State Models (MSMs)

Wei Wang

Outline

2

• General protocol of building MSM

• Challenges with MSM

• VAMPnets

• Time-lagged auto-encoder

Revisit the protocol of building MSM

3

Need a lot of expertise in biology & machine learning

4Wang, Cao, Zhu, Huang WIREs Comput. Mol. Sci., e1343, (2017)

Criterion to choose a model: slowest dynamics

5

Choose the MSM that best captures the slowest transitions of the system

Wang, Cao, Zhu, Huang WIREs Comput. Mol. Sci., e1343, (2017)

Choose the one with slowest transition

6

Timescales (μs)

Da, Pardo, Xu, Zhang, Gao, Wang, Huang, Nature Communications., 7, 11244, (2016)

Perform this cumbersome work: search

7

• Propose good clustering algorithms & features• Parametric search using good strategies

http://msmbuilder.org/osprey/1.1.0

Challenges: parametric space is too large: Collective Variable (CV)

8http://homepages.laas.fr/jcortes/algosb13/sutto-ALGO13-META.pdf

Need to propose good features

Challenges: parametric space is too large: CV

9http://homepages.laas.fr/jcortes/algosb13/sutto-ALGO13-META.pdf

Need to propose good features

Challenges: parametric space is too large: CV

10

Need to propose good features, otherwise will worsen the clustering stage

tICATruth

Wehmeyera and Noe, J. Chem. Phys. 148, 241703 (2018)

Challenges: parametric space is too large: clustering

11Zhang et al., Methods in Enzymology, 578, 343-371 (2016)

Essence of these operations

12

• Linearlly/Nonlinearlly transform the protein configurations into the state vectors: !"#$ → &', &), … , &+ , ∑-.'+ &+ = 1

(1, 0, 0, 0)

(0, 0, 1, 0)

Husic and Pande, J. Am. Chem. Soc. 2018, 140, 2386−2396

Deep learning can greatly help: powerful

13

• In the mathematical theory of artificial neural networks, theuniversal approximation theorem states that a feed-forwardnetwork with a single hidden layer containing a finite number ofneurons can approximate continuous functions on compactsubsets of Rn, under mild assumptions on the activationfunction.

• Deep learning has been widely applied in numerous fields

Dog: 0.99Cat: 0.01

https://en.wikipedia.org/wiki/Universal_approximation_theorem

Deep learning can greatly help MSM

14

Dog: 0.99Cat: 0.01

Macro1: 0.990Macro2: 0.005Macro3: 0.005

Outline

15



• VAMPnets


VAMPnets for deep learning of molecular kinetics

16

• VAMPnets: employ the variational approach for Markov processes(VAMP) to develop a deep learning framework for molecular kineticsusing neural networks, encodes the entire mapping from molecularcoordinates to Markov states, thus combining the whole data processingpipeline in a single end-to-end framework.

Noe et al., 9, 5, 2018, Nature Communications

coordinates

state vector

Related to the implied timescale plot, maximize it

Understanding VAMPnets

17

• The basic structure of neural network

• What is VAMP score

Basic structure of neural network

18

Forward propagation

19

Where can we get the weights?

Backpropagation to update the weights

20

Define a objective function ! = ∑$ %&'() − %+'),-

Weights are updated following the largest gradient direction

http://www.saedsayad.com/images/ANN_4.png


21https://independentseminarblog.files.wordpress.com/2017/12/giphy.gif


22

Define a objective function ! = ∑$ %&'() − %+'),-

Weights are updated following the largest gradient direction

http://www.saedsayad.com/images/ANN_4.png

In VAMPnets, it is VAMP-2 score

VAMP-2 score: objective function

23

!(#): state vector, e.g., ! # = (0,1,0) if x belongs to state 2


VAMP-2 score: related to TPM

24

!(#): state vector, e.g., ! # = (0,1,0) if x belongs to state 2

Sum of eigenvalues of T(*)+Related to the implied

timescale plot, we want tomaximize it


VAMPnets: example on alanine dipeptide

25Noe et al., 9, 5, 2018, Nature Communications

10 heavy atoms

xyz for 10 heavy atoms

Output: 6 probabilities

Try to lump to 6 states

VAMPnets: example on alanine dipeptide

26

• Visualizing the outputs (soft assignments)

• Once we have the state vectors, we can calculate TPM, and get the kinetics


Comparison with traditional way to build MSM

27

• Advantages• No need to worry about features to do tICA and the clustering

algorithms• Inputs are simple: aligned trajectories• Find the variationally optimal one

• Disadvantages• Easy to overfit the data• Easy to be trapped in local optimal


Alanine dipeptide

Outline

28



• VAMPnets


Other application of deep learning in MSM: CV

29

• Improve PCA/tICA through nonlinear transformation trained by (time-lagged) auto-encoder

• PCA/tICA: find the direction that maximizes the variance/time-lagged covariance matrix.

PCA: minimizing reconstruction error

30http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/

PCA: Linear version of auto-encoder

31

Original data Reconstructed data

Wehmeyer and Noe, J. Chem. Phys. 148, 241703 (2018)

Improving tICA using time-lagged auto-encoder

32

Time-lagged autoencoder:

D,E are constant matrix in tICA

Current frame Next frame


Improving tICA using time-lagged auto-encoder

33

Time-lagged autoencoder:

D,E are constant matrix in tICA

! = #


Time-lagged autoencoder improves over tICA

34

Villin


Summary

35

• Deep learning improves MSM in reducing the number of prior knowledge

• However, deep learning may overfit the data when our sampling is not enough

deeplearning enhanced markov state models (msms)chz379.ust.hk/songshanhu/deeplearning_msm.pdf · 2...

Documents