concept map

25
proteins STRUCTURE FUNCTION BIOINFORMATICS SFCscore: Scoring functions for affinity prediction of proteinligand complexes Christoph A. Sotriffer, Paul Sanschagrin, Hans Matter, and Gerhard Klebe * Department of Pharmaceutical Chemistry, Philipps-Universita ¨t Marburg, D-35032 Marburg, Germany INTRODUCTION Computational techniques have gained considerable importance in structure-based drug design. 1,2 One of the main tasks of computational methods in ligand design and lead identification is the elucidation and assessment of interaction modes between small-molecule ligands and protein target structures. This generally requires to estimate the relative or absolute affinity of a protein–ligand complex from its three-dimen- sional coordinates. Assuming conditions of equilibrium thermodynam- ics, the affinity of a ligand interacting noncovalently with a protein is defined by the equilibrium constant for the reaction: protein (aq) 1 ligand (aq) equilibrium-reaction arrows protein–ligand complex (aq) . This may either be expressed as association constant K a or its inverse, the dis- sociation constant K d , from which the Gibbs free energy of binding, DG, can readily be calculated. For enzyme inhibitors, the affinity is more commonly specified in terms of the inhibition constant K i (determined by means of kinetic assays), which generally corresponds to the dissocia- tion constant of the enzyme-inhibitor complex. Although statistical ther- modynamics would in principle provide the necessary equations to cal- culate free energies of binding from molecular properties, these equa- tions are not readily amenable to computation since appropriate ensembles of the solvated systems must be generated and thoroughly sampled, which normally requires prohibitively long computing times. 3–7 For the purpose of drug design, simpler and faster methods are needed, which are commonly referred to as scoring functions. They try to estimate the binding affinity from a single configuration of the protein–ligand complex, normally without consideration of explicit water molecules. To compensate for this fundamental simplification, experimental data are normally used to derive such functions. Scoring functions, thus, try to capture the essential elements of protein–ligand interactions in a computa- tionally efficient way. In general, three major classes of scoring functions can be distin- guished: force-field-based methods, knowledge-based potentials, and Christoph A. Sotriffer’s current address is Institute of Pharmacy and Food Chemistry, University of Wu ¨rzburg, Am Hubland, D-97074 Wu ¨rzburg, Germany. Paul Sanschagrin’s current address is Schrodinger, Inc., 120 W 45th St, New York, NY 10036, USA. Hans Matter’s current address is Sanofi-Aventis Deutschland GmbH, Chemical Sciences, Drug Design, Industriepark Ho ¨chst, D-65926 Frankfurt am Main, Germany. Additional Supporting Information may be found in the online version of this article. Grant sponsor: Scoring Function Consortium. *Correspondence to: Gerhard Klebe, Department of Pharmaceutical Chemistry, Philipps-Universita ¨t Marburg, Marbacher Weg 6, D-35032 Marburg, Germany. E-mail: [email protected] Received 19 May 2007; Revised 25 January 2008; Accepted 15 February 2008 Published online 28 April 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.22058 ABSTRACT Empirical scoring functions to calculate bind- ing affinities of protein–ligand complexes have been calibrated based on experimental struc- ture and affinity data collected from public and industrial sources. Public data were taken from the AffinDB database, whereas access to industrial data was gained through the Scoring Function Consortium (SFC), a collaborative effort with various pharmaceutical companies and the Cambridge Crystallographic Data Cen- ter. More than 850 complexes were obtained by the data collection procedure and subsequently used to setup different training sets for the parameterization of new scoring functions. Over 60 different descriptors were evaluated for all complexes, including terms accounting for interactions with and among aromatic ring systems as well as many surface-dependent terms. After exploratory correlation and regres- sion analyses, stepwise variable selection proce- dures and systematic searches, the most suita- ble descriptors were chosen as variables to cali- brate regression functions by means of multiple linear regression or partial least squares analy- sis. Eight different functions are presented herein. Cross-validated r 2 (Q 2 ) values of up to 0.72 and standard errors (s PRESS ) generally below 1.15 pK i units suggest highly predictive functions. Extensive unbiased validation was carried out by testing the functions on large data sets from the PDBbind database as used by Wang et al. (J Chem Inf Comput Sci 2004;44:2114–2125) in a comparative analysis of other scoring functions. Superior perform- ance of the SFCscore functions is observed in many cases, but the results also illustrate the need for further improvements. Proteins 2008; 73:395–419. V V C 2008 Wiley-Liss, Inc. Key words: empirical scoring function; multi- ple linear regression; PLS; structure-based drug design. V V C 2008 WILEY-LISS, INC. PROTEINS 395

Upload: anonymous-3q6hik

Post on 08-Jul-2016

212 views

Category:

Documents


0 download

DESCRIPTION

Concept Map

TRANSCRIPT