advances in cheminformatics
Post on 07-Dec-2021
5 Views
Preview:
TRANSCRIPT
Advances in Advances in CheminformaticsCheminformatics
Applications in Biotechnology, Drug Applications in Biotechnology, Drug Design and Design and BioseparationsBioseparations
Curt M. BrenemanCurt M. Breneman
Department of Chemistry and Chemical Department of Chemistry and Chemical Biology/Center for Biotechnology and Biology/Center for Biotechnology and
Interdisciplinary StudiesInterdisciplinary Studies
Rensselaer Polytechnic InstituteRensselaer Polytechnic Institute
Presented at Siena College, NY 4/29/05
The Informatics ProcessThe Informatics Process
WISDOM
DATA
INFORMATION
UNDERSTANDING
KNOWLEDGE
Representing MoleculesRepresenting Molecules
OH3C
NN
CH3
N
CH3
Descriptors: Quantifying Molecular Descriptors: Quantifying Molecular PropertiesProperties
Molecular Surface PropertiesMolecular Surface Properties
Electronic PropertiesElectronic Properties–– Electrostatic PotentialElectrostatic Potential
–– Electronic Kinetic Energy DensityElectronic Kinetic Energy Density
–– Electron Density GradientsElectron Density Gradients ∇ρ∇ρ••NN
–– LaplacianLaplacian of the Electron Density of the Electron Density
–– Local Average Ionization PotentialLocal Average Ionization Potential
–– Bare Nuclear Potential (BNP)Bare Nuclear Potential (BNP)
–– Fukui functionFukui function F+(rF+(r) = ) = ρρHOMO(r)
EP ( r ) =Z α
r − Rαα∑ −
ρ (r' )dr 'r − r'∫
K ( r ) = −(ψ * ∇ 2ψ + ψ∇ 2ψ *)
G (r ) = −∇ ψ * .∇ ψ
L(r) = −∇ 2ρ(r) = K (r) − G(r)
PIP ( r ) =ρ i ( r ) ε i
ρ ( r )i∑
HOMO(r)
Why use Electron DensityWhy use Electron Density--Derived Derived Molecular Descriptors?Molecular Descriptors?
MotivationsMotivations–– Electron Density Distributions represent molecular Electron Density Distributions represent molecular
properties that are key to biological activitiesproperties that are key to biological activities
Enabling TechnologiesEnabling Technologies–– Fast methods (TAE/RECON) for obtaining electron Fast methods (TAE/RECON) for obtaining electron
densitydensity--derived propertiesderived properties
Encoding schemesEncoding schemes–– Surface Property distributions (Histograms, Wavelets, Surface Property distributions (Histograms, Wavelets,
DixelsDixels))
–– Shape/Property hybrid distributions (PEST)Shape/Property hybrid distributions (PEST)
SynergiesSynergies–– Complementary to topological descriptorsComplementary to topological descriptors
Surface Property Distribution Histograms Surface Property Distribution Histograms (RECON/TAE) Descriptors(RECON/TAE) Descriptors
Molecular surface property distributions can be represented as Molecular surface property distributions can be represented as RECON/TAE histogram bin descriptorsRECON/TAE histogram bin descriptors
Surface Property EncodingSurface Property Encoding
Molecular Surface Properties:Molecular Surface Properties:Wavelet Coefficient Descriptors (WCD)Wavelet Coefficient Descriptors (WCD)
Wavelet Surface Wavelet Surface Property Reconstruction:Property Reconstruction:
16 coefficients from S7 and 16 coefficients from S7 and D7 portions of the WCD D7 portions of the WCD vector represent surface vector represent surface property densities with property densities with >95% accuracy.>95% accuracy.
1024 raw wavelet coefficients capture PIP distribution on molecular surface.
Wavelet Wavelet Decomposition:Decomposition:
–– Creates a set of Creates a set of coefficients that coefficients that represent a represent a waveform.waveform.
–– Small coefficients Small coefficients may be omitted to may be omitted to compress data.compress data.
Wavelet Representations of HighWavelet Representations of High--Resolution Resolution Molecular Surface Property Densities. Molecular Surface Property Densities.
(1,2,10 and 20 Coefficient Decompositions)(1,2,10 and 20 Coefficient Decompositions)
Wavelet Representations of HighWavelet Representations of High--Resolution Resolution Molecular Surface Property Densities.Molecular Surface Property Densities.
Molecular Shape Encoding Molecular Shape Encoding
Karthigeyan Nagarajan, Randy Zauhar, and William J. Welsh, “Enrichment of Ligands for the Serotonin Receptor Using the Shape Signatures Approach” J. Chem. Inf. Model., 45, 49-57 (2005)
Curt M. Breneman, C. Matthew Sundling, N. Sukumar, Lingling Shen, William P. Katt and Mark J. Embrechts, “New developments in PEST shape/property hybrid descriptors” J. Computer-Aided Mol. Design, 17, 231–240, (2003)
PEST: Molecular Shape/Property Hybrid PEST: Molecular Shape/Property Hybrid EncodingEncoding
PEST PEST (Property(Property--Encoded Encoded Surface Translation)Surface Translation)–– Adds shape information to encode Adds shape information to encode
the spatial relationships of surface the spatial relationships of surface propertiesproperties
PEST Molecular Ray Tracing AlgorithmPEST Molecular Ray Tracing Algorithm
PEST PropertyPEST Property--Encoded RaysEncoded Rays
PEST Hybrid Shape/Property Histogram PEST Hybrid Shape/Property Histogram Convergence : Four sets of initial conditionsConvergence : Four sets of initial conditions
Machine Learning and Machine Learning and Model BuildingModel Building
Model Building and ValidationModel Building and Validation
DATASET
Test set
PredictiveModel
Prediction
Training set
Training Validation
Bootstrap sample k
Tuning /Prediction
LearningModel
Y-scrambling model validation!
2ε
ξ *ξ
( ) ( )f x wx b ε= + +
( ) ( )f x wx b ε= + −
Support Vector Regression
Empirical errorε-insensitive loss function:
( ) max(0, | ( ) | )L x y f xε ε= − −
( )f x wx b= +
( )
( )
*
1 1
1
*
1
*
. + ( ) ( )
. .
, , , , 0 , 1, , 1, ,
n l
i i i ii i
n
j i i ji ji
n
i i ji j ji
i i j j
Cm in C b u vl
y u v x b
s t u v x b y
u v j l i n
νε ξ ξ
ε ξ
ε ξ
ξ ξ ε
= =
=
=
+ + + +
− − − ≤ +
− + − ≤ +
≥ = =
∑ ∑
∑
∑… …
11
. ( )m
ii
m in C L x wε=
+∑
Linear hypotheses
Minimize:
Empirical error + Complexity
Complexity controll1-norm weight vector:
11
n
ii
w=
= ∑w
l1-norm l2-norm
hERG: ROC Curve Comparisonsleave-one-out results from different models
Before Feature Selection After Feature Selection
45 109 36
KPLS Test
3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFA
PEST CoMFAYesYes
YesYes
Generate property Generate property isosurfacesisosurfaces
~10 minutes / mol.~10 minutes / mol.
PLS, kPLS, k--PLS, any PLS, any induction learners induction learners (NN, decision trees)(NN, decision trees)
DifficultDifficult
NoNoNoNoGrid resolution, Grid resolution, and fieldsand fieldsDependsDepends……PLSPLS
IntuitiveIntuitive
Align. independent
Unsupervised
Preparation
Computation Runtime
Model Building
Model Interpretation
3D QSAR: 3D QSAR: CoMFACoMFAComparative Molecular Field Analysis Comparative Molecular Field Analysis
Standard in 3D QSAR methodsStandard in 3D QSAR methodsRequires alignmentRequires alignment
Alignment rules do not perform well in unsupervised operations!
3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFARESULTS: trypsin
q2 (training)q2 (training) # # componentscomponents
r2 (testing)r2 (testing)
PLSPLS 0.0.6161 77 0.0.6565PLSPLS
PLSPLS
BootstrapBootstrap--PLSPLS
kk--PLSPLS
0.0.7676 99 0.0.88550.0.8787 44 0.0.75750.0.8888 44 0.0.7979
0.0.9696 44 0.0.7979
CoMFA
PEST
3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFARESULTS
CoMFA can do this, …
What can PEST do?
RESULTS
3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFA
PLS indicates which descriptors are most PLS indicates which descriptors are most important in the modelimportant in the model
Graphical Analysis can localize PEST descriptor Graphical Analysis can localize PEST descriptor contributions, for examplecontributions, for example……
PEST Descriptor EP(6,1)PEST Descriptor EP(6,1)
PEST Descriptor EP(2,5)PEST Descriptor EP(2,5)
PEST for Protein PEST for Protein CharacterizationCharacterization
1AO6 and 135L Protein Surfaces (EP)1AO6 and 135L Protein Surfaces (EP)
1AO6 135L
PPEST 1AO6.epPPEST 1AO6.ep
PPEST 135L.epPPEST 135L.ep
PPEST 1AO6.mlp2PPEST 1AO6.mlp2
PPEST 135L.mlp2PPEST 135L.mlp2
P.PHENYL (RECON+MOE)
P.PHENYL (RECON+MOE)
P.Phenyl (RECON+PEST+MOE)P.Phenyl (RECON+PEST+MOE)
SummarySummary
Electron DensityElectron Density--Derived molecular property descriptors Derived molecular property descriptors contain valuable physicochemical informationcontain valuable physicochemical information
TAE descriptors are useful for building virtual highTAE descriptors are useful for building virtual high--throughput screening models (ADME, bioassay)throughput screening models (ADME, bioassay)
Predictive models can be built using TAE and PEST Predictive models can be built using TAE and PEST descriptorsdescriptors
Proteins (or protein binding sites) may be characterized Proteins (or protein binding sites) may be characterized using Protein PEST techniquesusing Protein PEST techniques
Current SoftwareCurrent SoftwareRECON 5.8 + Analyze w/Outlier detectionRECON 5.8 + Analyze w/Outlier detection–– RADRAD–– Fast KPLS test set mode with low memory footprintFast KPLS test set mode with low memory footprint
RECON for MOERECON for MOE–– DropDrop--in interactive or batch RECON 5.8 for MOE 2003in interactive or batch RECON 5.8 for MOE 2003
RECON 2001 for protein characterizationRECON 2001 for protein characterization–– Property moment descriptors (Cramer)Property moment descriptors (Cramer)–– Binding site/Binding site/ligandligand scoring using Universal Descriptor Space scoring using Universal Descriptor Space
((TropshaTropsha))TAE/DIXELTAE/DIXEL–– DNA Characterization and bioinformatics (Lawrence)DNA Characterization and bioinformatics (Lawrence)
PEST (Compatible with Gaussian or Jaguar 5.0)PEST (Compatible with Gaussian or Jaguar 5.0)–– PADPAD–– WSADWSAD–– WaveletsWavelets
ACKNOWLEDGMENTSACKNOWLEDGMENTSMembers of the DDASSL groupMembers of the DDASSL group
–– Breneman Research Group (RPI Chemistry)Breneman Research Group (RPI Chemistry)N. N. SukumarSukumarM. SundlingM. SundlingC. Whitehead (Pfizer)C. Whitehead (Pfizer)L. L. ShenShenL. Lockwood (Albany Molecular)L. Lockwood (Albany Molecular)M. SongM. SongD. D. ZhuangZhuangW. W. KattKattQ. Q. LuoLuo
–– Embrechts Research Group (RPI DSES)Embrechts Research Group (RPI DSES)–– TropshaTropsha Research Group (UNC Chapel Hill)Research Group (UNC Chapel Hill)–– Bennett Research Group (RPI Mathematics)Bennett Research Group (RPI Mathematics)
JinboJinbo BiBi
Collaborators:Collaborators:–– Lawrence Research Group (NYS Wadsworth Labs)Lawrence Research Group (NYS Wadsworth Labs)
Inna Inna VitolVitol–– Cramer Research Group (RPI Chemical Engineering)Cramer Research Group (RPI Chemical Engineering)
FundingFunding–– NIH (GM047372NIH (GM047372--07)07)–– NSF (BESNSF (BES--0214183, BES0214183, BES--0079436, IIS0079436, IIS--9979860)9979860)–– GE Corporate R&D CenterGE Corporate R&D Center–– Millennium PharmaceuticalsMillennium Pharmaceuticals–– Concurrent PharmaceuticalsConcurrent Pharmaceuticals–– Pfizer PharmaceuticalsPfizer Pharmaceuticals–– ICAGEN PharmaceuticalsICAGEN Pharmaceuticals–– Eastman Kodak CompanyEastman Kodak Company
top related