elucidating the multiple roles of hydration in protein

29
doi.org/10.26434/chemrxiv.7723223.v1 Elucidating the Multiple Roles of Hydration in Protein-Ligand Binding via Layerwise Relevance Propagation and Big Data Analytics Markus Lill, Ying Yang, Amr Mahmoud, Matthew Masters Submitted date: 15/02/2019 Posted date: 15/02/2019 Licence: CC BY-NC-ND 4.0 Citation information: Lill, Markus; Yang, Ying; Mahmoud, Amr; Masters, Matthew (2019): Elucidating the Multiple Roles of Hydration in Protein-Ligand Binding via Layerwise Relevance Propagation and Big Data Analytics. ChemRxiv. Preprint. Hydration is a key player in protein-ligand association. No computational method for modeling hydration has so far consistently improved the scoring performance of docking approaches. Using molecular dynamics on thousands of proteins in conjunction with modern deep learning approaches allowed the successful modeling of hydration during scoring of protein-ligand binding poses. This on-the-fly inclusion of hydration information resulted in unprecedented accuracy in binding pose prediction. Big-data analytics based on relevance deduced from the trained neural network revealed that the correct prediction of binding poses depends on three essential pillars of hydration, i.e. water-mediated interactions, desolvation, and enthalpically stable water layers around the bound ligand. The latter form of hydration may open new avenues for optimizing ligands for diverse protein targets. File list (2) download file view on ChemRxiv ChemRxiv.pdf (1.23 MiB) download file view on ChemRxiv ChemRxiv_SI.pdf (310.08 KiB)

Upload: others

Post on 23-Oct-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elucidating the Multiple Roles of Hydration in Protein

doi.org/10.26434/chemrxiv.7723223.v1

Elucidating the Multiple Roles of Hydration in Protein-Ligand Binding viaLayerwise Relevance Propagation and Big Data AnalyticsMarkus Lill, Ying Yang, Amr Mahmoud, Matthew Masters

Submitted date: 15/02/2019 • Posted date: 15/02/2019Licence: CC BY-NC-ND 4.0Citation information: Lill, Markus; Yang, Ying; Mahmoud, Amr; Masters, Matthew (2019): Elucidating theMultiple Roles of Hydration in Protein-Ligand Binding via Layerwise Relevance Propagation and Big DataAnalytics. ChemRxiv. Preprint.

Hydration is a key player in protein-ligand association. No computational method for modeling hydration hasso far consistently improved the scoring performance of docking approaches. Using molecular dynamics onthousands of proteins in conjunction with modern deep learning approaches allowed the successful modelingof hydration during scoring of protein-ligand binding poses. This on-the-fly inclusion of hydration informationresulted in unprecedented accuracy in binding pose prediction.Big-data analytics based on relevance deduced from the trained neural networkrevealed that the correct prediction of binding poses depends on three essential pillars of hydration, i.e.water-mediated interactions, desolvation, and enthalpically stable water layers around the bound ligand. Thelatter form of hydration may open new avenues for optimizing ligands for diverse protein targets.

File list (2)

download fileview on ChemRxivChemRxiv.pdf (1.23 MiB)

download fileview on ChemRxivChemRxiv_SI.pdf (310.08 KiB)

Page 2: Elucidating the Multiple Roles of Hydration in Protein

Elucidating the Multiple Roles of Hydration in

Protein-Ligand Binding via Layerwise Relevance

Propagation and Big Data Analytics

Amr H. Mahmoud, Matthew R. Masters, Ying Yang, and Markus A. Lill∗

Department of Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy,

Purdue University, 575 Stadium Mall Drive, West Lafayette, Indiana 47906, United States

E-mail: [email protected]

Phone: (765) 496-9375. Fax: 765 494-1414

Abstract

Accurate and efficient prediction of protein-ligand interactions has been a long-

lasting dream of practitioners in drug discovery. The insufficient treatment of hydra-

tion is widely recognized to be a major limitation for accurate protein-ligand scoring.

No hydration approach so far has significantly and consistently improved the scoring

performance of docking methods. To overcome this issue, we here present the first deep

neural network approach encompassing solvation effects with unprecedented accuracy

in binding pose ranking. Using an integration of molecular dynamics simulations on

thousands of protein structures with novel big-data analytics based on deep Taylor

decomposition, three different patterns of hydration were consistently identified to be

essential for correct binding-pose prediction. In addition to desolvation and water-

mediated interactions, the formation of enthalpically favorable networks of first-shell

water molecules around solvent-exposed ligand moieties was identified to be an es-

sential element for protein-ligand binding. Despite being currently neglected in drug

1

Page 3: Elucidating the Multiple Roles of Hydration in Protein

discovery, this hydration phenomenon could lead to new avenues in optimizing the free

energy of ligand binding.

Introduction

Water is a crucial player in protein-ligand binding processes.1,2 In structure-based drug de-

sign, ligands are optimized to replace energetically unfavorable water molecules, particularly

ordered water molecules in hydrophobic moieties. This desolvation free energy is often an

essential driving force for strong protein-ligand association. In contrast, enthalpically favor-

able water molecules are often utilized as part of the ligand design process as mediator of

protein-ligand interactions.

Traditionally, desolvation effects in docking were modeled by an empirical term charac-

terizing hydrophobic contacts between the protein and the ligand, e.g. using counts of close

hydrophobic protein-ligand atom pairs or using a term that is proportional to the solvent-

accessible surface.3–6 These approaches are implicitly modeling desolvation effects but ignore

thermodynamic properties of individual binding-site water molecules which can significantly

differ based on the detailed protein environment.2,7,8

To explicitly and more precisely model solvation effects of individual water molecules, a

spectrum of various computational approaches have been developed to identify the likely po-

sition of water molecules in binding sites and to evaluate their energetic stability.9 Accurate

estimation of position and desolvation energies of individual water molecules can be ob-

tained by Monte-Carlo (MC) or molecular-dynamics (MD)-based simulations methods, such

as WaterMAP2,10 or WATsite.7,8 Considering the excess enthalpy and entropy of the water

molecules in the binding site that are replaced by a ligand, the contribution of desolvation to

the binding free energy can in principle be computed.2,11,12 For a direct inclusion of explicit

desolvation into standard scoring procedures, grid-based adaptations of the inhomogeneous

solvation theory (IST),13,14 such as GIST,15 were utilized. Even with such accurate desol-

2

Page 4: Elucidating the Multiple Roles of Hydration in Protein

vation estimation, the docking performance did not significantly nor consistently improve

for most protein targets.16 Reasons for these results include the lack of consistent scoring

function design and optimization including the explicit desolvation term (desolvation term

has been typically used as a subsequent add-on to existing scoring functions), inaccuracies

in the other interactions terms and the neglect of water-mediated interactions.

Different strategies have been devised to model explicit water-mediated interactions in

docking studies. One class of approaches explicitly includes water molecules, that could be

critical for mediating protein-ligand interactions, throughout docking. In the simplest ap-

proaches binding-site water molecules, either predicted or from X-ray structures, are man-

ually selected and are kept throughout docking as part of the protein template.17 It was,

however, observed that the performance of docking depends critically on the studied protein

system and on the water-selection process; no consistent improvement in docking perfor-

mance was observed.17 For flexible treatment of water-mediated interactions, different adap-

tations of widely-used docking programs have been devised. For example in GOLD,18 water

molecules are switched on and off and are allowed to rotate throughout the docking process.

In AutoDock,19 water molecules are attached to the docked ligand and their turning on or off

is evaluated during the scoring process for each given docking pose. The energetic evaluation

of the (de)solvation process is based on empirical scoring terms. However, so far no scoring

method incorporates successfully both explicit desolvation free energy of individual water

molecules and water mediated interactions into a single, consistent concept.

In addition to hydration, standard scoring functions, i.e. force-field based, emirical or

statistical scoring functions, share another major limitation in context of binding pose and

affinity prediction. They all rely on a pre-defined functional form whose parameters are op-

timized against experimental data such as the binding affinity of protein-ligand complexes.

They are reductionist models of the complex thermodynamic process of protein-ligand bind-

ing which result in significant loss of accuracy in binding pose prediction and are typically

unsuccessful in precise quantification of binding affinities. This reductionist approach also

3

Page 5: Elucidating the Multiple Roles of Hydration in Protein

contributed to the observed convergence in docking quality among different programs.20 One

of the underlying shortcomings that cause the reduction in accuracy are the poor representa-

tion, or the complete lack thereof, of the environmental context of interactions. For example,

the strength of a hydrogen-bond is strongly dependent on the hydrophobic character of its

environment.21–23

Recently, several groups utilized convolutional neural networks (CNNs) to generalize

the modeling of protein-ligand interactions in context of scoring functions.24,25 In these ap-

proaches, the structural information of a protein-ligand complex is provided to the CNN

as 3D images representing the density of different atom types. The architecture of CNNs

allows them to learn from this data important local features of protein-ligand interactions

without limitation to pre-defined scoring functions that typically rely on two-body interac-

tion models. In principle, the generalized modeling of protein-ligand interactions by CNNs

includes the environmental context of interactions. Current CNN models are focused on

modeling the direct interactions between protein and ligand in the bound state but neglect

critical contributions of protein-ligand binding, in particular the solvation and desolvation

of protein and ligand upon binding.

Here we present to the best of our knowledge, the first attempt to include explicit hy-

dration information into CNN-based scoring, titled DeepWATsite (Figure 1), which exhibits

significant improvement in binding pose prediction. To train and test a scoring function

using CNN, a large dataset of protein-ligand complex structures is required. As the under-

lying MD simulations for WATsite are rather time-consuming to apply to a large number of

protein structures, we have developed a new accelerated WATsite implementation based on

GPU-acceleration, asynchronous data output and protein truncation (SI Section S1).

A common criticism of deep neural networks is the black-box character of the gener-

ated models. To allow for a direct interpretation of the CNN models, we implemented a

recently developed form of layer-wise relevance propagation (LRP) methodology based on

deep Taylor decomposition27 (SI Section S2.5). The LRP analysis revealed that DeepWAT-

4

Page 6: Elucidating the Multiple Roles of Hydration in Protein

Figure 1: Overall procedure and outcome of DeepWATsite. (A) Hydration informationfor 2423 protein structure from PDBbind was generated using a combination of 3D-RISM,GAsol, MD and WATsite.26 (B) Protein and ligand density for all poses was used in additionto water occupancy, enthalpy and entropy grids as input layers for CNN (C). (D) An ensembleof CNN models was extracted from the CNN training process, (E) generating a vector ofclassification scores (1: native, 0: incorrect) for each pose. This vector of scores was usedas input to LamdaMART. LamdaMART generated a linear-combination of regression treesaiming to optimize the ranking of all poses for a given protein-ligand complex. (F) In additionto significant improvement in pose ranking, the underlying CNN model identified on-the-flyimportant hydration contributions to ligand binding.

5

Page 7: Elucidating the Multiple Roles of Hydration in Protein

site is able to identify and model water-mediated interactions and desolvation contribu-

tions to protein-ligand binding. Furthermore, an additional hydration phenomenon critical

for protein-ligand binding was revealed, enthalpically favorable first-shell hydration layers

around solvent-exposed ligand moieties.28

The manuscript is structured to first describe the new WATsite implementation to gen-

erate accurate hydration information for thousands of protein-ligand complexes, and second

to discuss the incorporation of binding site hydration profiles into the new scoring function

DeepWATsite. We then present results on pose prediction accuracy of DeepWATsite and

the LRP analysis on select protein-ligand systems.

Results and Discussion

Accelerated Implementation of WATsite

To prove that our new accelerated WATsite3.0 implementation is efficient enough for big-

data applications, we compared the simulation performance of WATsite3.0 with the previous

WATsite2.0 version using GROMACS, here on the DHFR and HIV-1 protease system. When

considering just the MD simulation portion itself, a three-fold speed increase is achieved

using a single GPU (NVIDIA GTX 1080 Ti) with OpenMM compared to using 16 CPU

cores with GROMACS (Figure 2A). In addition, the total computation time includes post-

analysis for hydration site prediction. The previous WATsite2.0 implementation based on

GROMACS performs free energy calculation for each individual hydration site afterwards in

the form of an energy rerun. In contrast, the new implementation of WATsite3.0 based on

OpenMM generates water energies during the MD simulation. Whereas this implementation

slightly reduces the performance of the OpenMM-Watsite simulation compared to standard

OpenMM (cf. Figure 2A, Equ vs Prod), it accelerates the energy analysis significantly. A

speed-up of 15-18 can therefore be obtained for the whole hydration site profiling procedure

using WATsite3.0 compared to WATsite2.0 (Figure 2A).

6

Page 8: Elucidating the Multiple Roles of Hydration in Protein

Figure 2: (A) Simulation performance of WATsite with and without GPU-acceleration.(top) Performance measured in ns/day for simulating DHFR (blue), HIV-1 protease without(orange) and with 12 A threshold radius (gray) using Gromacs on 16 threads and usingOpenMM-WATsite without (Equ) and with water energy calculation and output (Prod).(bottom) Total time spent on 20 ns MD simulation separated by simulation and post-analysis.(B) Percentage of protein systems with native pose (RMSD < 2 A) within the top-1, top-3, and top-5 ranked poses poses using different scoring functions: Vina (green), CNN withprotein and ligand information (blue), DeepWATsite with occupancy (yellow), DeepWATsitewith entropy, negative and positive enthalpy (purple), and LamdaMART scoring based ona fusion of the two DeepWATsite models (red). (C) LRP analysis on HIV-1 protease incomplex with inhibitor A79285 (PDB-code: 1dif). (a) Higher relevance is highlighted bydarker red color for binding site residues (shown as sticks) and by darker magenta colorfor ligand atoms. (b) X-ray structure of complex of protein (lines, carbon in blue) andligand (sticks, carbon in green) with surrounding water molecules (red spheres). (c) Water-mediated interaction between protein and ligand. (d) LRP analysis on water occupancyshowing relevant water density, in particular around the center of the ligand (green arrows).One area of water density overlaps with the water molecule engaged in mediating interactionsbetween ligand and protein. The darkest red density coincides with a water molecule that isreplaced by two hydroxyl groups forming critical and very stable interactions with aspartateside chains of the protein. (e) This particular water density was predicted with negative,i.e. favorable enthalpy (orange arrow), in the LRP analysis focusing on negative enthalpy.First-shell water molecules with favorable enthalpy are additional relevant hydration features(green arrows). (f) Relevant features with positive enthalpy overlap with hydrophobic ligandmoieties (isopropyl and phenyl groups) highlighting the importance of desolvation to ligandbinding. Darker color means higher relevance in predicting native binding pose.

7

Page 9: Elucidating the Multiple Roles of Hydration in Protein

Figure 3: (A) LRP analysis on endothiapepsin complexed with CP-81,282 (PDB-code: 1epo)(a) showing relevant ligand atoms, protein residues and water occupancy. (b) X-ray struc-ture of protein-ligand complex highlighting the water-mediated interactions between proteinand ligand (yellow dashed line) coinciding with the water occupancy of highest relevance. (c)Water density with relevant positive desolvation enthalpy overlaps with hydrophobic groupsof ligand highlighting the contribution of desolvation to the ligand binding to hydrophobicbinding site moieties. (B) LRP analysis on thermolysin complexed with (2-sulphanyl-3-phenylpropanoyl)-Phe-Tyr (PDB-code: 1qf0) highlighting (a) most relevant protein residuesand ligand atoms, and (b) relevant positive (red) and negative (blue) enthalpy contribu-tions of hydration. (c) X-ray structure of protein-ligand complex highlighting the extendedfirst-layer water network around the solvent-exposed ligand moiety. (C) LRP analysis onrRNA methyltransferase ErmC’ bound with S-adenosyl-L-homocysteine (PDB-code: 1qan)highlighting (a) most relevant protein residues and ligand atoms, and (b) relevant positive(red) and negative (blue) enthalpy contributions of (de)solvation. (c) X-ray structure ofprotein-ligand complex. Same representations as in Figure 2.

8

Page 10: Elucidating the Multiple Roles of Hydration in Protein

Pose Prediction using CNN-WATsite

Using a training set of only 377 complexes, three different CNN models were generated for

comparison purposes, one with only occupancy information about hydration (named Deep-

WATsite (occupancy)), one with positive and negative enthalpy as well as entropy infor-

mation about hydration (named DeepWATsite (H-,H+,TS) and one without any hydration

information (named CNN(P+L)) (SI Sections S2.1 and S2.2).

Figure 2B compares the number of targets with successful identification of native poses

ranked among the top-1, top-3 or top-5 ranked poses by using Vina scoring function, CNN(P+L)

scoring using only protein and ligand information and DeepWATsite including protein, ligand

and hydration information. Only the results for the 2046 test systems are shown.

Whereas Vina identifies native poses in 61%, 77% and 85% of all systems in the top-

1, top-3 and top-5 ranked poses, respectively, CNN(P+L) significantly outperforms Vina in

identifying native poses. CNN(P+L) displays an improvement of 9%, 10% and 7% compared

to Vina scoring in the top-1, top-3 and top-5 ranked poses, respectively.

Including hydration information into the CNN model further improves the prediction

quality significantly. the occupancy hydration model improves the prediction of Vina by

16%, 15% and 11%, the enthalpy/entropy model by 14%, 14% and 11% at the top-1, top-3

and top-5 level, respectively. If the search algorithm is able to provide a native-like pose,

DeepWATsite can rank this pose within the top-5 ranked poses in almost all cases.

To further increase the accuracy of the pose ranking predictions, we made use of the

three following observations: First, the occupancy and enthalpy/entropy hydration models

are modeling complementary aspects of hydration. Second, all the previous CNN models

were considering pose prediction as a point-wise classification problem in which each pose

for itself is classified as either correct or incorrect pose. Not all poses with an RMSD > 2A,

however, are equally wrong, as even high RMSD poses may share native protein-ligand

contacts similar to the correct binding pose. Third, when extracting well-performing models

during the training process, these different models typically contain a native-like pose within

9

Page 11: Elucidating the Multiple Roles of Hydration in Protein

the top-5 ranked poses, but the position within this list is differing.

These observations motivated us to adapt the LambdaMART machine learning algo-

rithm29 on a ensemble of CNN models combining hydration occupancy and thermodynamics

(SI Sections S2.3 and S2.4). This technique was used as it considers pose-prediction as a

ranking problem where the optimization of the ranks of all the poses of a given protein-ligand

complex is performed simultaneously in a list-wise fashion. Furthermore, it utilizes all the

scores of the ensemble of CNN models at once using gradient boosted decision trees aiming

to push the position of the native-like poses to the top rank. This model achieved unprece-

dented pose ranking accuracy on the large and diverse test set of protein-ligand complexes,

predicting a native-like pose correctly at top-1 position in 90 % of all cases (top-3: 97 %,

top-5: 99 %) (Figure 2B).

Analysis of Relevant Features

While the pose ranking performance of DeepWATsite is significantly improved compared to

standard protein-ligand based CNN, it is critical to understand how the inclusion of hydration

information caused the quality increase. Furthermore, the utility of DeepWATsite and CNN-

based scoring in general as drug design tools critically depends on the human interpretability

of the model. Thus, we aimed to generate a detailed analysis method for interpreting the

CNN models and for identifying critical features for protein-ligand binding. This could guide

the rational optimization of molecules with a new level of accuracy and detail compared to

standard docking approaches. For this analysis, we implemented the recently developed

layer-wise relevance propagation (LRP) based on deep Taylor decomposition (SI Section

S2.5).27

LRP analysis identifies the binding site residues and ligand groups critical for protein-

ligand interactions in the native binding mode. It further highlights the water density most

crucial for ligand binding. For example, Figure 2C(a) identifies the central region of the

ligand to be the most important for its interaction with the protein. The side chains of

10

Page 12: Elucidating the Multiple Roles of Hydration in Protein

residue Ile84 (chain B) and Ile50 (chain A) form critical hydrophobic contacts with the

isopropyl and phenyl groups of the ligand (green arrow) anchoring the ligand in the binding

site of the protein. The backbone amines of Ile50(A) and Ile50(B), which are predicted

by the CNN model with high relevance, form critical water-mediated interactions with two

hydrogen bond acceptor groups of the ligand (Figure 2C(b) and (c)).

Co-localized with this central region are highly relevant water molecules (Figure 2C(d))

demonstrating that DeepWATsite identifies critical solvation contributions for protein-ligand

interactions. One water molecule (lower green arrow) coincides with the water-mediating

protein-ligand interactions. Another water molecule (upper green arrow) forms strong in-

teractions with two aspartate residues in the unbound form of the protein. Even in the

unbound state, WATsite predicted a favorable enthalpy value for this water location (Figure

2D(e), orange arrow). This water molecule is replaced by a dihydroxy group of the ligand,

strongly anchoring the ligand in the binding site of the protein. Additional solvent density

with significant relevance and positive enthalpy overlaps with hydrophobic ligand groups, i.e.

the two isopropyl and phenyl groups of the symmetric ligand (Figure 2D(f)). As these water

molecules are replaced upon ligand binding and therefore contribute favorably to the binding,

they highlight the importance of desolvation as a significant driving force for protein-ligand

association.

The LRP analysis of all studied targets identified relevant water density for the correct

prediction of protein-ligand complexes, highlighting the importance of desolvation free energy

contributions and water-mediated interactions for protein-ligand association. For example,

in the complex of CP-81,282 with endothiapepsin, a water molecule mediating critical in-

teractions between protein and ligand is identified with the highest relevance (Figure 3A(a),

dark red density). Water density with positive enthalpy, i.e. unfavorable water locations,

are identified in hydrophobic moieties with high relevance (Figure 3A(c)). Those regions

overlap with hydrophobic ligand moieties. This again underlines the importance of desolva-

tion as a driving force for ligand binding, also commonly termed the hydrophobic affect of

11

Page 13: Elucidating the Multiple Roles of Hydration in Protein

protein-ligand association.

In thermolysin complexed with (2-sulphanyl-3-phenylpropanoyl)-Phe-Tyr, the LRP anal-

ysis reveals that the coordination of the ligand to the zinc ion stabilized by three histidine

residues is essential for ligand binding (Figure 3B(a)). Relevant water density overlapping

with the ligand is identified, highlighting again desolvation free energy contributions to bind-

ing. In addition, water density with high relevance is identified in an extended region adjacent

to the ligand (blue density in the lower part of Figure 3B(b)). This density co-localizes with

a network of x-ray water molecules. Most of those water molecules do not mediate interac-

tions between protein and ligand and none is replaced by the ligand upon association. These

water molecules instead form a stable network of first-shell water molecules surrounding the

exposed phenyl moiety of the ligand. The stability of this network is documented by the

negative desolvation enthalpy.

Using a combination of X-ray crystallography, ITC measurements and simulations, Klebe

and co-workers28 recently studied this particular water network around thermolysin in-

hibitors. They were able to interpret surprising thermodynamic structure-activity relation-

ships with the stability or partial rupture of this network cage around the solvent-exposed

ligand moieties. The DeepWATsite CNN model identifies the importance of this water net-

work without any specific user input besides binding pose and WATsite information.

In addition to thermolysin, the DeepWatsite model was able to identify other protein sys-

tems with important first-shell water layers around exposed ligand moieties. This observation

suggests that this role of water molecules may be of general importance and currently under-

appreciated compared to desolvation effects and water-mediated interactions. For example,

in HIV-protease with bound A79285, water density with negative enthalpy was identified

around the ligand (Figure 2C(e), green arrows). In rRNA methyltransferase ErmC’ bound

with S-adenosyl-L-homocysteine, water density with negative enthalpy and high relevance

is identified around the sulfur-containing moiety of the ligand (upper part of Figure 3C(b))

co-localized with four x-ray water molecules. Those water molecules neither directly mediate

12

Page 14: Elucidating the Multiple Roles of Hydration in Protein

interactions between protein and ligand nor are replaced by the ligand upon association.

Conclusion

Deep learning has established itself as one of the leading methodologies in image and speech

recognition but has only recently entered the field of protein-ligand modeling. Also re-

cent is the emerging emphasis on the importance of inclusion of (de)solvation in modeling

protein-ligand interactions. In this article, we presented the first deep learning approach

to model protein-ligand interactions incorporating hydration information. The significant

improvement of DeepWATsite for pose ranking compared to standard CNN models or em-

pirical scoring functions highlights the utility of this approach for a more precise modeling

of protein-ligand interactions.

Of similar importance as the pose ranking quality is the new level of interpretability of

the underlying CNN models using the LRP analysis in our study. Several selected examples

highlight that the model and its analysis can identify residues and ligand functional groups

critical for protein-ligand binding. Furthermore, different types of hydration effects essential

for ligand binding were identified, including desolvation, water-mediated interactions and

the formation of stable networks of water molecules around exposed ligand moieties.

This information can be valuable in various contexts. For example, the identification of

important residues and ligand atoms for interaction can guide the rational modification of a

ligand without relying on an imprecise energetic analysis based on simplistic scoring func-

tions that neglects the environmental context of interactions. The identification of critical

water-mediated interactions can guide the selection of water molecules to be included during

ligand design or during docking. The analysis is also able to reveal enthalpically favorable

water networks on the boundary of binding sites that are currently neglected by most drug

design projects but can be important for the observed structure-activity relationships as

recently highlighted.28 Our analysis reveals that this ignored facet of solvation may be more

13

Page 15: Elucidating the Multiple Roles of Hydration in Protein

widespread in protein-ligand complexes than currently recognized.

Supporting Information Available

All of the materials and methods conducted in this study are detailed in the Supporting Infor-

mation: Accelerated WATsite Implementation, Preparation of Protein Systems, MD Simula-

tions, Hydration Site Analysis, Grid-based Hydration Analysis, Convergence of Grid-Based

Energy Analysis, System Truncation, Pose Generation, Inclusion of Hydration Information

into CNN Scoring, Fusion of DeepWATsite models, Pose Ranking Using LambdaMART,

Layer-Wise Relevance Propagation (LRP).

References

(1) Ladbury, J. E. Just add water! The effect of water on the specificity of protein-ligand

binding sites and its potential application to drug design. Chem. Biol. 1996, 3, 973–980.

(2) Abel, R.; Young, T.; Farid, R.; Berne, B. J.; Friesner, R. A. Role of the Active-Site

Solvent in the Thermodynamics of Factor Xa Ligand Binding. J. Am. Chem. Soc. 2008,

130, 2817–2831.

(3) Bohm, H.-J. The development of a simple empirical scoring function to estimate the

binding constant for a protein-ligand complex of known three-dimensional structure. J.

Comput. Aided Mol. Des. 1994, 8, 243–256.

(4) Eldridge, M. D.; Murray, C. W.; Auton, T. R.; Paolini, G. V.; Mee, R. P. Empirical

scoring functions: I. The development of a fast empirical scoring function to estimate

the binding affinity of ligands in receptor complexes. J. Comput. Aided Mol. Des. 1997,

11, 425–445.

(5) Morris, G. M.; Goodsell, D. S.; Halliday, R. S.; Huey, R.; Hart, W. E.; Belew, R. K.;

14

Page 16: Elucidating the Multiple Roles of Hydration in Protein

Olson, A. J. Automated docking using a Lamarckian genetic algorithm and an empirical

binding free energy function. J. Comput. Chem. 1998, 19, 1639–1662.

(6) Huang, S. Y.; Zou, X. Inclusion of solvation and entropy in the knowledge-based scoring

function for protein-ligand interactions. J. Chem. Inf. Model. 2010, 50, 262–273.

(7) Hu, B.; Lill, M. A. WATsite: Hydration Site Prediction Program with PyMOL Inter-

face. J. Comput. Chem. 2014, 35, 1255–1260.

(8) Yang, Y.; Hu, B.; Lill, M. A. Methods Mol. Biol. (N.Y., NY, U.S.); Springer, 2017; pp

123–134.

(9) Nittinger, E.; Flachsenberg, F.; Bietz, S.; Lange, G.; Klein, R.; Rarey, M. Placement of

Water Molecules in Protein Structures: From Large-Scale Evaluations to Single-Case

Examples. J. Chem. Inf. Model. 2018, 58, 1625–1637, PMID: 30036062.

(10) Young, T.; Abel, R.; Kim, B.; Berne, B. J.; Friesner, R. A. Motifs for Molecular Recog-

nition Exploiting Hydrophobic Enclosure in Protein–Ligand Binding. Proc. Natl. Acad.

Sci. USA 2007, 104, 808–813.

(11) Higgs, C.; Beuming, T.; Sherman, W. Hydration Site Thermodynamics Explain SARs

for Triazolylpurines Analogues Binding to the A2A Receptor. ACS Med. Chem. Lett.

2010, 1, 160–164.

(12) Abel, R.; Salam, N. K.; Shelley, J.; Farid, R.; Friesner, R. A.; Sherman, W. Contribution

of Explicit Solvent Effects to the Binding Affinity of Small-Molecule Inhibitors in Blood

Coagulation Factor Serine Proteases. ChemMedChem 6, 1049–1066.

(13) Lazaridis, T. Inhomogeneous Fluid Approach to Solvation Thermodynamics. 1. Theory.

J. Phys. Chem. B 1998, 102, 3531–3541.

(14) Lazaridis, T. Inhomogeneous Fluid Approach to Solvation Thermodynamics. 2. Appli-

cations to Simple Fluids. J. Phys. Chem. B 1998, 102, 3542–3550.

15

Page 17: Elucidating the Multiple Roles of Hydration in Protein

(15) Nguyen, C. N.; Young, T. K.; Gilson, M. K. Grid inhomogeneous solvation theory:

hydration structure and thermodynamics of the miniature receptor cucurbit[7]uril. J.

Chem. Phys. 2012, 137, 044101.

(16) Balius, T. E.; Fischer, M.; Stein, R. M.; Adler, T. B.; Nguyen, C. N.; Cruz, A.;

Gilson, M. K.; Kurtzman, T.; Shoichet, B. K. Testing inhomogeneous solvation theory

in structure-based ligand discovery. Proc. Natl. Acad. Sci. U.S.A. 2017, 114, E6839–

E6846.

(17) Kumar, A.; Zhang, K. Y. J. Investigation on the Effect of Key Water Molecules on

Docking Performance in CSARdock Exercise. J. Chem. Inf. Model. 2013, 53, 1880–

1892, PMID: 23617355.

(18) Verdonk, M. L.; Chessari, G.; Cole, J. C.; Hartshorn, M. J.; Murray, C. W.;

Nissink, J. W.; Taylor, R. D.; Taylor, R. Modeling water molecules in protein-ligand

docking using GOLD. J. Med. Chem. 2005, 48, 6504–6515.

(19) Forli, S.; Olson, A. J. A Force Field with Discrete Displaceable Waters and Desolvation

Entropy for Hydrated Ligand Docking. J. Med. Chem. 2012, 55, 623–638, PMID:

22148468.

(20) Wang, Z.; Sun, H.; Yao, X.; Li, D.; Xu, L.; Li, Y.; Tian, S.; Hou, T. Comprehensive

evaluation of ten docking programs on a diverse set of proteinligand complexes: the

prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys.

2016, 18, 12964–12975.

(21) Hu, M.; Urbic, T. Strength of hydrogen bonds of water depends on local environment.

J. Chem. Phys. 2012, 136, 144305.

(22) Grdadolnik, J.; Merzel, F.; Avbelj, F. Origin of hydrophobicity and enhanced water

hydrogen bond strength near purely hydrophobic solutes. Proc. Natl. Acad. Sci. U.S.A.

2017, 114, 322–327.

16

Page 18: Elucidating the Multiple Roles of Hydration in Protein

(23) Gao, J.; Bosco, D. A.; Powers, E. T.; Kelly, J. W. Localized thermodynamic coupling

between hydrogen bonding and microenvironment polarity substantially stabilizes pro-

teins. Nat. Struct. Mol. Biol. 2009, 16, 684 EP.

(24) Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. ProteinLigand Scoring

with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57, 942–957.

(25) Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A Deep Convolutional Neural Net-

work for Bioactivity Prediction in Structure-based Drug Discovery. arXiv preprint

arXiv:1510.02855 2015, 1–11.

(26) Masters, M. R.; Mahmoud, A. H.; Yang, Y.; Lill, M. A. Efficient and Accurate Hydra-

tion Site Profiling for Enclosed Binding Sites. J. Chem. Inf. Model. 2018, 58, 2183–

2188, PMID: 30289252.

(27) Montavon, G.; Lapuschkin, S.; Binder, A.; Samek, W.; Muller, K. R. Explaining non-

linear classification decisions with deep Taylor decomposition. Pattern Recognit. 2017,

65, 211–222.

(28) Biela, A.; Betz, M.; Heine, A.; Klebe, G. Water Makes the Difference: Rearrangement

of Water Solvation Layer Triggers Non-additivity of Functional Group Contributions

in ProteinLigand Binding. ChemMedChem 2012, 7, 1423–1434.

(29) Burges, C. J. From ranknet to lambdarank to lambdamart: An overview. Learning

2010, 11, 81.

17

Page 20: Elucidating the Multiple Roles of Hydration in Protein

Supporting Information

Amr H. Mahmoud, Matthew R. Masters, Ying Yang, Markus A. Lill

December 9, 2018

S1 Hydration Profiling in Context of Big-Data Analytics

S1.1 Accelerated WATsite Implementation

To make the inclusion of WATsite hydration data into CNN possible, hydration profiling had to beperformed for a large set of protein structures. This required a new implementation of WATsite withsignificant speed-up.

WATsite simulations were performed using OpenMM on GPU architecture, where the OpenMMcode was modified to allow for on-the-fly calculation and report of interaction-energies between eachwater molecule and its surrounding. For each water molecule in the system, the interaction energybetween this water molecule and all other protein, co-factor and water residues is calculated asyn-chronously to the MD propagation increasing the speed of the simulation. The water-interactionenergies are subsequently averaged over all MD snapshots to calculate the de-solvation enthalpy inWATsite. De-solvation entropy is computed based on the MD trajectory [1, 2]. For further speed-upof the MD simulation protocol, the protein was truncated beyond a sphere positioned in the center ofthe binding site. This truncation is motivated by the fact that the protein is restrained in all WATsitesimulations to achieve convergence in water occupancy and free-energy profiling.

S1.2 Preparation of Protein Systems

For validating and benchmarking the accelerated WATsite version HIV-1 protease (PDB: 1PHX) wasused as protein system. In addition to HIV-1 protease, dihydrofolate reductase (DHFR) was used tomeasure the efficiency of the new implementation. This system allows for direct comparison with otherMD programs such as Amber, Charmm, or OpenMM, where it has been used as standard benchmark.For training and validation of the CNN models 2423 protein-ligand structures from PDBbind wereused. For each protein, the co-crystallized ligand was removed from its complex structure in orderto profile the hydration site information in the binding site of each protein. Protein structures wereprepared using Schrodinger’s Protein Preparation Wizard [3]. In short, hydrogen positions, bondorders, and protonation states of histidines, glutamic and aspartic acids, and side-chain conformationsof asparagine and glutamine side chains were optimized using the default protocol.

S1.3 MD Simulations

MD simulations were performed using the modified GPU-accelerated OpenMM-WATsite package withthe AMBER14SB force field [4] and SPC/E water model [5, 6]. The SHAKE algorithm [7] was appliedto constrain bonds including hydrogen atoms to their equilibrium lengths and maintain rigid watergeometries. Long-range electrostatic interactions were treated with the Particle Mesh Ewald method[8] with a cutoff of 10 A for the direct interactions. The Lennard-Jones interactions were truncatedat a distance of 10 A, and a long-range isotropic correction was applied to the pressure representingLennard-Jones interactions beyond the cutoff. A Langevin integrator with a time step of 2 fs wasused together with a stochastic thermostat collision frequency of 1 ps−1. The pressure control wasimplemented via isotropic box edge adjustment by MC moves every 25 time steps simulating the effectof constant pressure. The system was first energy minimized and then heated to 298 K over 50 ps ofMD simulations, followed by 1 ns of equilibration MD simulations at 298 K and 1 bar with periodicboundary conditions in all three dimensions. During the minimization and equilibration process, allprotein heavy atoms were harmonically restrained with a spring constant of 4.8 kcal mol−1 A−2.

S1

Page 21: Elucidating the Multiple Roles of Hydration in Protein

S1.4 Hydration Site Analysis

A detailed description of the WATsite methodology can be found in previous publications [1, 2]. Inshort, a 3D grid was placed over the user-defined binding site. The occupancy of water moleculeswas distributed onto the 3D grid using all snapshots generated throughout the production run of theMD simulation. The distribution of the occupancy was then normalized and DBSCAN clusteringalgorithm [9] was used to identify the pronounced peaks that define the hydration site locations.For each identified hydration site, WATsite tools were used to compute the enthalpic (∆Hhs) andentropic (∆Shs) change of transferring a water molecule from the bulk solvent to the hydration siteof the protein binding site. The output files from the modified OpenMM-WATsite code containingthe individual water interaction energies were used in this step.

S1.5 Grid-based Hydration Analysis

The previous implementations of WATsite predicted hydration sites representing the cluster centers ofhigh solvent density regions (see previous section). This localized representation of water occupancyis not well suited as input for CNN, which requires 3D grid data. Inspired by the grid inhomoge-neous solvation theory (GIST) [10], an alternative grid-based analysis was implemented into WATsite.Similar to hydration site prediction, a 3D grid was placed over the user-defined binding site. Theoccupancy of water molecules was distributed onto the 3D grid using all snapshots generated through-out the production run of the MD simulation. In contrast to standard WATsite, the occupancy isnot clustered into hydration sites, but every grid point with larger than twice the bulk occupancy isconsidered as a ‘pseudo-hydration site’ and any water molecule within 1 A radius throughout the MDtrajectory is considered to contribute to it. The de-solvation enthalpy and entropy of the ‘pseudo-hydration site’ is calculated similarly as in the original hydration site analysis using the contributionof any water molecule within 1 A radius throughout the MD trajectory.

In order to achieve fast convergence of the thermodynamic profile of hydration data, in particularfor water molecules in enclosed binding sites, we utilized our recently developed protocol that usesGAsol, which relies on 3D-RISM, for initial water placement. [11]

S1.6 Convergence of Grid-Based Energy Analysis

We performed convergence studies of the grid-based energy analysis. 100 ns MD simulations were per-formed for the production run for the full protein system, and snapshots were saved every picosecondin NetCDF format, generating 100,000 frames. To test the convergence of water-energy calculation,we performed grid-based hydration analysis for the first 1, 2, 3, 4, 5, 10, 20, and 50 ns of the 100 nsMD simulation. Energy grids predicted from shorter simulations were compared to that from 100 nsby calculating the Pearson correlation coefficient r2 between two sets of energy calculations. r2 valuesfor ∆G grids (when compared to the full 100 ns simulation results) increase from 0.84 over 0.91 to0.94 when extending the analysis from the first 1 ns over 3 ns to 5 ns of the MD simulation (TableS1). Based on this study, WATsite simulations were performed for 5 ns for each protein system.

r2

Checkpoint (ns) Occupancy ∆G ∆H -T∆S1 0.98 0.84 0.85 0.852 0.99 0.89 0.90 0.893 0.99 0.91 0.91 0.914 0.99 0.93 0.94 0.935 0.99 0.94 0.94 0.947 1.00 0.95 0.96 0.9510 1.00 0.96 0.97 0.9620 1.00 0.97 0.98 0.9850 1.00 0.99 0.99 0.99

Table S1: Pearson correlation coefficient r2 for occupancy and energy grids of HIV-1 protease atdifferent simulation lengths compared to simulation with a total of 100 ns.

S2

Page 22: Elucidating the Multiple Roles of Hydration in Protein

S1.7 System Truncation

For speed-up of the molecular dynamics (MD) simulation protocol, the protein was truncated beyonda sphere positioned in the center of the binding site. Different sphere radii or cutoffs (12, 15, 17, 20,25, and 30 A) were tested. We analyzed the impact of such truncations by comparing the WATsitepredictions to those of the full protein system. For all truncated systems, a python script is usedto add capping acetyl (ACE) and amide (NME) groups to the break points in the protein sequence.The protein structures were then solvated in an orthorhombic box of water molecules. A minimumdistance of 10 Abetween any protein atom and the faces of the box was chosen. Hydration siteand grid-based analysis were performed using 20 ns MD simulations of HIV-1 protease truncated atdifferent cutoff distances around the center of the binding site. For the grid-based analysis, the valuesof the overlap coefficient OC (Equation 1).

OC =

N∑i=1

min

(p1i∑Nj=1 p

1j

;p2i∑Nj=1 p

2j

)(1)

of energy grids comparing truncated (energy values on grid point i p1i ) with full system (p2i )are reported in Table S2. With OC value of 0.91, 0.91, 0.93, truncation at 12 A seems to be ableto reproduce the results of the full system with an additional 80 ns/day speed increase for the testsystem. It should be noted that the full HIV-1 protease system consists of only 198 protein residues.Thus, the speed increase is not as significant as expected, in particular in the range of 17 to 25 Atruncation. The advantage of truncation is more pronounced for larger systems. The system with a30 A cutoff truncation consists of the entire protein and was used as reference for the full system. Tobe on the safe side, we selected a truncation radius of 15 A for the current study

OC

Truncation (A) Occupancy ∆G ∆H -T∆S Speed (ns/day)12 0.97 0.91 0.91 0.93 25515 0.97 0.91 0.91 0.93 21417 0.97 0.92 0.91 0.94 18920 0.97 0.93 0.93 0.95 17325 0.97 0.93 0.93 0.95 17330 1.00 1.00 1.00 1.00 173

Table S2: Effects of truncation on the accuracy of grid energies and the speed of simulation.

S2 Binding Pose Prediction using CNN

S2.1 Pose Generation

Protein-ligand complexes from the PDBbind refined set were selected as training and test set forpose prediction with DeepWATsite. Similar to the work of Ragoza et al. [12], the co-crystalizedligand was extracted and re-docked using Smina [13] with the AutoDock Vina scoring function [14].Docking was performed into rigid protein structures with all water molecules stripped. Protonationstates for ligand and protein were determined using OpenBabel [15]. 25 poses for each system weregenerated and classified based on their RMSD to the X-ray binding pose. Poses with an RMSD < 2A are considered native or active poses, poses with RMSD > 4 A decoy poses. Poses with an RMSDbetween 2 and 4 A were not considered in the training process. As the focus is on the developmentof an improved scoring function, only systems with at least one native-like pose (RMSD < 2 A tonative binding pose) were selected for subsequent CNN modeling.

S2.2 Inclusion of Hydration Information into CNN Scoring

Protein and ligand atoms were modeled as density on a 3D grid encompassing the binding site ofthe protein. This data was provided to the CNN model in the form of 34 different 3D grids as inputchannels representing the density of 34 distinct atom types (16 protein and 18 ligand types). The

S3

Page 23: Elucidating the Multiple Roles of Hydration in Protein

atom type classification was based on Smina atom type rules.[13] The density distribution of eachprotein or ligand grid was computed using the following piece-wise continuous function [12]

ρi(r) =

exp(− 2r2

R2i

) if 0 ≤ r < Ri4

e2R2ir2 − 12

e2Rir + 9

e2 if Ri ≤ r < 1.5Ri

0 if 1.5Ri ≤ r(2)

where Ri is the van der Waals radius of atom type i and r is the distance between atom center andgrid point at which the atom density was computed. Each 3D grid covered a volume of 24x24x24 A3

using a grid spacing of 0.5 A. The data was provided as MolGrid input layers to Gnina[12] with Caffeas the deep-learning framework.[16]

Occupancy and thermodynamic values generated by WATsite on 3D grids encompassing the bind-ing site were provided as additional input channels to the CNN model. We developed two differentCNN models including hydration information. One model was constructed using only hydration oc-cupancy in addition to protein and ligand density. This model is purely based on density informationof protein, ligand, and water molecules.

The other model, in contrast, utilizes energetic values of hydration in addition to protein andligand density. Only enthalpy and entropy were used as input, as the free energy is the sum of thetwo. To allow for normalized input between 0 and 1, a hyperbolic tangent function was used to scalethe thermodynamic values. Whereas the entropy values were always positive with respect to bulksolvent entropy, the enthalpy values can assume positive and negative values. Thus, we separated theenthalpy data into two different input channels, one for positive enthalpy values and one for negativevalues. The latter were taken as absolute values before normalization by the hyperbolic tangentfunction.

Thus, in total 35 and 37 input channels are provided to the two CNN models in the form of 3Dgrids, respectively. CNN models using only the 34 protein and ligand input channels were generatedfor comparison, to validate the importance of solvation information in the CNN modeling.

In addition to the input layer, the CNN architecture contained three convolutional layers withrectified linear unit (ReLU) activations coupled with intermittent max pooling layers, one fully con-nected layer and a final softmax layer which mapped a probability for the two output options of theclassifier model, native (RMSD to x-ray pose < 2 A) or decoy pose.

Figure S1: Architecture of CNNs used in this study.

The models were trained using the Caffe framework for 40,000 iterations using a stochastic gradientdescent variant. Only 377 randomly selected protein structures were used for training. This choiceguarantees the rigorous applicability validation of the model on a large and diverse external test setof 2046 structures. To allow for sufficient optimization of the model, the training set was augmentedusing random rotation and translation (up to 2 A) of the input images (3D grids).

S2.3 Fusion of DeepWATsite models

The occupancy and enthalpy/entropy hydration models are modeling complementary aspects of hy-dration. To further increase the accuracy of the pose ranking predictions, a DeepWATsite modelwas generated combining the features from the occupancy models with those of the model based onentropy, negative and positive enthalpy. The output nodes from the last convolutional layers of bothmodels were concatenated as input for a final single fully-connected layer (Figure S2.3). This allows

S4

Page 24: Elucidating the Multiple Roles of Hydration in Protein

the model to extract geometric and energetic features separately from each other before combiningthem for a final classification of poses.

Figure S2: Fusion of CNN models basedon protein-, ligand-, water-occupancylayers (cplx occ) and protein-, ligand-occupancy with entropy and enthalpyinformation of hydration (cplx thermo).Model consists of three pooling lay-ers (unitA pool, unitA poolb withA= 1, 2, 3) and convolutional layers(unitA conv1 + unitA relu with A=1, 2, 3) each, and a final fully-connectedlayer (output fc) after concatenating(concat) the output of the final convo-lutional layers.

S2.4 Pose Ranking Using LambdaMART

Traditional supervised machine learning aims to solve the prediction problem (classification or re-gression) for a single instance at a time. In our case, each pose was classified into native (RMSD< 2 A) or incorrect (RMSD > 4 A) pose, despite the fact that not every high RMSD pose is equallywrong. Native protein-ligand interactions may still be partially conserved for poses with RMSD > 4A. The loss function of the classification CNN model, however, does not allow to make any separationbetween poses that partially conserve native contacts and poses that do not. This limits the poseprediction accuracy obtained during the training process of the CNN.

To overcome this issue, we used our CNN models as input for subsequent LambdaMART ranking[17]. In contrast to traditional supervised classification, LambdaMART is a supervised machinelearning technique that aims to rank a list of instances, here all poses of a specific protein-ligandcomplex, therefore allowing to distinguish between poses that partially conserve native contacts andposes that do not. LambdaMART uses gradient boosted decision trees with a loss function derived

S5

Page 25: Elucidating the Multiple Roles of Hydration in Protein

from LambdaRank [17] for solving the ranking task.For the purpose of pose ranking, we used the fused DeepWATsite model described in the previous

section. 40 models were collected during the training process. Each pose i obtained a classificationscore ci,a ∈ [0, 1] for each of the a = 1, . . . , 40 models. A linear combination of regression treesm = 1, . . . ,M was built using the ci(i = 1, . . . , N) as descriptor vector of length 40. N is the numberof samples, here poses. Each subsequent tree Q tries to reduce the prediction error of the linearcombination of regression trees m = 1, . . . , Q − 1 aiming to improve the ranking of native poses fora given protein target, compared to all decoy poses. This ensemble of trees was optimized usingNewton’s method based on the gradients derived from LambdaRank [17].

Algorithm 1 LambdaMART

1: for i = 0, ..., N do2: Fo(ci) = 03: end for4: for m = 1, ..., M do5: for i = 0, ..., N do6: Compute gradient λi7: Compute second derivative wi

8: end for9: Create regression tree with K leafs per tree: {Rkm}Kk=1

10: Compute step size of Newton optimization step = value of leaf k in tree m: γkm =

∑ci∈Rkm

λi∑ci∈Rkm

wi

11: Fm(ci) = Fm−1(ci) + η∑k

γkm1(ci ∈ Rkm)

12: end for

In detail, the base model, i.e. initial prediction, for each sample i is set to zero (lines 1-3).Regression trees m = 1, . . . ,M are then subsequently added (line 4-12): For tree Q, first, the gradientλi (line 6) and second derivative wi (line 7) of the loss function with respect to the predicted rankingscore FQ−1(ci) = si is computed. We used cross-entropy loss function and a sigmoid function to mapthe ranking scores si and sj , to probabilities that a pair of poses i and j is correctly ranked

Pij ≡ P (Posei � Posej) ≡1

1 + e−σ(si−sj)(3)

with si = FQ−1(ci), sj = FQ−1(cj). The gradient λi is then

λi =∑

j:{i,j}∈I

λij −∑

j:{j,i}∈I

λij (4)

with

λij =−σ

1 + e−σ(si−sj). (5)

The two terms of the difference in equation 4 count the number of correct (first term) and incorrect(second term) rankings in the list of poses sorted based on native versus decoy poses. To push nativeposes to the top of the ranking list, higher ranked poses in the current prediction using Q − 1 treesshould obtain a higher weight in the gradient. Therefore, the λij are multiplied by the difference innormalized discounted cumulative gain (NDCG) obtained by swapping two entries in the ranking list

NDCG(T ) =DCG(T )

max(DCG(T ))

DCG(T ) =

T∑i=1

2li − 1

log(1 + i)

(6)

where T is the truncation level of the ranked list. Thus,

λij =−σ

1 + e−σ(si−sj)∆NDCG(T ). (7)

S6

Page 26: Elucidating the Multiple Roles of Hydration in Protein

The next tree is created (line 9), where Rk,m describes the disjoint region of the regression tree for leafk in tree m. The step size of the Newton optimization step is computed (line 10), which correspondsto the value γkm assigned to leaf k in tree m. Finally, the new regression tree m is added to the linearcombination of trees 1, . . . ,m− 1 (line 11). η is the learning rate in LambdaMART.

S2.5 Layer-Wise Relevance Propagation (LRP)

Despite impressive performances of CNNs in many applications related to image recognition, themodels act typically as ‘black boxes’ with a lack of interpretability. This ‘black box’ character is theresult of the underlying nonlinear structure of the neural network. Layer-wise relevance propagation(LRP) is a recent method that aims to map the outcome of a model back to the input layers [18].An adaptation of the initial method was recently tested on the study of protein-ligand interactionprediction [19]. The initial LRP, however, allowed for the mixture of positive and negative relevancevalues that results in artifacts such as large positive or negative values distant to the binding site.

In this paper we implemented a new LRP methodology for interpreting generic multilayer neuralnetworks based on the deep Taylor decomposition method [20]. Deep Taylor decomposition is amethod to explain the individual predictions of deep neural networks in terms of the input layers. Itfunctions by performing a layer-by-layer backward pass on the network starting from the classificationbetween native (output value = 1) and decoy pose (output value = 0) using the following rules:According to the new LRP analysis, this positive defined classifier outcome in node j in output layerN (= classification as active pose for given input x), named relevance RNj (x), needs to be conservedin each layer during the backward pass∑

i

R1i (x) =

∑i

R2i (x) = . . . =

∑i

RN−1i (x) = RNj (x) (8)

Furthermore, the relevance in each layer measures the contribution of each node to the decisionthat the pose is native with zero meaning no contribution. Therefore, all relevance values should bepositive or zero

∀i,K, x : RKi (x) ≥ 0 (9)

The Taylor expansion at each node i of layer N − 1 estimates the contribution of this node to thepositive outcome on node j (=native pose) in the output layer N . The relevance values from layerN −1 are subsequently propagated to layer N −2 using Taylor decomposition on each node i of layerN − 2 with respect to its overall relevance contribution of layer N − 1. Considering that the outputvalue of layer N and the output of each ReLU node in the convolutational layers is non-negative, thefollowing z+ rule can be derived for the relevance back-propagation [20]

RK−1i (x) =∑j

z+ij∑m z

+mj

RK,+j (x) (10)

using only positive activation contributions z+ij = xiw+ij , i.e.

w+ij =

{wij if wij ≥ 0

0 if wij < 0.(11)

and propagating relevances only through positively activated nodes

RK,+j (x) =

{RKj (x) if

∑i xiwij + bj ≥ 0

0 if∑i xiwij + bj < 0.

(12)

where xi is the activation of node i in layer K − 1, wij the weight of connection between node i inlayer K − 1 and j in layer K, and bj the bias in layer K.

S7

Page 27: Elucidating the Multiple Roles of Hydration in Protein

References

[1] Bingjie Hu and Markus A. Lill. Watsite: Hydration site prediction program with pymol interface.J. Comput. Chem., 35(16):1255–1260, 4 2014.

[2] Ying Yang, Bingjie Hu, and Markus A. Lill. Watsite2.0 with pymol plugin: Hydration siteprediction and visualization. In Methods Mol. Biol. (N.Y., NY, U.S.), pages 123–134. Springer,2017.

[3] G. Madhavi Sastry, Matvey Adzhigirey, Tyler Day, Ramakrishna Annabhimoju, and WoodySherman. Protein and ligand preparation: parameters, protocols, and influence on virtual screen-ing enrichments. Journal of Computer-Aided Molecular Design, 27(3):221–234, Mar 2013.

[4] James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser,and Carlos Simmerling. ff14sb: Improving the accuracy of protein side chain and backboneparameters from ff99sb. Journal of Chemical Theory and Computation, 11(8):3696–3713, jul2015.

[5] H. J. C. Berendsen, J. R. Grigera, and T. P. Straatsma. The missing term in effective pairpotentials. The Journal of Physical Chemistry, 91(24):6269–6271, nov 1987.

[6] Swaroop Chatterjee, Pablo G. Debenedetti, Frank H. Stillinger, and Ruth M. Lynden-Bell. Acomputational investigation of thermodynamics, structure, dynamics and solvation behavior inmodified water models. The Journal of Chemical Physics, 128(12):124511, mar 2008.

[7] Jean paul Ryckaert, Giovanni Ciccotti, and Herman J. C. Berendsen. Numerical integration ofthe cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes.J. Comput. Phys, pages 327–341, 1977.

[8] T. Darden, D. York, and L. Pedersen. Particle mesh Ewald: An Nslog(N) method for Ewaldsums in large systems. J. Chem. Phys., 98:10089–10092, 1993.

[9] Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. A density-based algorithm fordiscovering clusters a density-based algorithm for discovering clusters in large spatial databaseswith noise. In Proceedings of the Second International Conference on Knowledge Discovery andData Mining, KDD’96, pages 226–231. AAAI Press, 1996.

[10] Crystal N Nguyen, Tom Kurtzman Young, and Michael K Gilson. Grid inhomogeneous solvationtheory: hydration structure and thermodynamics of the miniature receptor cucurbit[7]uril. J.Chem. Phys., 137(4):044101, jul 2012.

[11] Matthew R. Masters, Amr H. Mahmoud, Ying Yang, and Markus A. Lill. Efficient and accuratehydration site profiling for enclosed binding sites. J. Chem. Inf. Model. PMID: 30289252.

[12] Matthew Ragoza, Joshua Hochuli, Elisa Idrobo, Jocelyn Sunseri, and David Ryan Koes. Pro-tein–Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model., 57(4):942–957,apr 2017.

[13] David Ryan Koes, Matthew P. Baumgartner, and Carlos J. Camacho. Lessons learned in em-pirical scoring with smina from the csar 2011 benchmarking exercise. J. Chem. Inf. Model.,53(8):1893–1904, 2013. PMID: 23379370.

[14] Oleg Trott and Arthur J. Olson. Autodock vina: Improving the speed and accuracy of dockingwith a new scoring function, efficient optimization, and multithreading. J. Comput. Chem.,31(2):455–461.

[15] Noel M. O’Boyle, Michael Banck, Craig A. James, Chris Morley, Tim Vandermeersch, andGeoffrey R. Hutchison. Open babel: An open chemical toolbox. J. Cheminformatics, 3(1):33,Oct 2011.

[16] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick,Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature em-bedding. In Proceedings of the 22Nd ACM International Conference on Multimedia, MM ’14,pages 675–678, New York, NY, USA, 2014. ACM.

S8

Page 28: Elucidating the Multiple Roles of Hydration in Protein

[17] Christopher JC Burges. From ranknet to lambdarank to lambdamart: An overview. Learning,11(23-581):81, 2010.

[18] Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus RobertMuller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions bylayer-wise relevance propagation. PLoS ONE, 10(7):1–46, 2015.

[19] Joshua Hochuli, Alec Helbling, Tamar Skaist, Matthew Ragoza, and David Ryan Koes. Visu-alizing convolutional neural network protein-ligand scoring. J. Mol. Graph. Model., 84:96–108,2018.

[20] Gregoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, andKlaus Robert Muller. Explaining nonlinear classification decisions with deep Taylor decom-position. Pattern Recognit., 65(August 2016):211–222, 2017.

S9