adventures in computational enzymology john mitchell university of st andrews
TRANSCRIPT
Adventures in Computational Adventures in Computational EnzymologyEnzymology
John Mitchell
University of St Andrews
MMechanism, AAnnotation and CClassification iin EEnzymes.http://www.ebi.ac.uk/thornton-srv/databases/MACiE/
The MACiE DatabaseThe MACiE Database
G.L. Holliday et al., Nucl. Acids Res., 35, D515-D520 (2007)
Gemma Holliday, Daniel Almonacid, Noel O’Boyle,
Janet Thornton, Peter Murray-Rust, Gail Bartlett,
James Torrance, John Mitchell
Enzyme Nomenclature and Enzyme Nomenclature and ClassificationClassificationEC ClassificationEC Classification
Class
Subclass
Sub-subclass
Serial number
The EC ClassificationThe EC Classification
Reaction direction arbitrary
Cofactors and active site residues ignored
Doesn’t deal with structural and sequence information
However, it was never intended to do so
Deals with overall reaction, not mechanism
A New Representation of Enzyme Reactions?
Should be complementary to, but distinct from, the EC system
Should take into account:
Reaction Mechanism
Structure
Sequence
Active Site residues
Cofactors Need a database of enzyme mechanisms
MMechanism, AAnnotation and CClassification iin EEnzymes.http://www.ebi.ac.uk/thornton-srv/databases/MACiE/
MACiE DatabaseMACiE Database
Global Usage of MACiE
MACiE Entries
MACiE Mechanisms are Sourced from the Literature
Coverage of MACiE
Representative – based on a non-homologous dataset,and chosen to represent each available EC sub-subclass.
EC is not Everything
• Different mechanisms can occur with exactly the same EC number.
• MACiE has six beta-lactamases, all with different mechanisms but the same overall reaction.
EC Coverage of MACiE
Representative – based on a non-homologous dataset,and chosen to represent each available EC sub-subclass.
Structures exist for:
6 EC 1.-.-.-
61 EC 1.2.-.-
204 EC 1.2.3.-
1776 EC 1.2.3.4
MACiE covers:
6 EC 1.-.-.-
57 EC 1.2.-.-
183 EC 1.2.3.-
321 EC 1.2.3.4
EC Coverage of MACiE
Repertoire of Enzyme CatalysisRepertoire of Enzyme Catalysis
G.L. Holliday et al., J. Molec. Biol., 372, 1261-1277 (2007)
G.L. Holliday et al., J. Molec. Biol., 390, 560-577 (2009)
Repertoire of Enzyme Catalysis
0
20
40
60
80
100
120
140
HeterolyticElimination
HomolyticElimination
ElectrophilicAddition
NucleophilicAddition
HomolyticAddition
ElectrophilicSubstitution
NucleophilicSubstitution
HomolyticSubstitution
Reaction Types
Num
ber
of
step
s in
MA
CiE
Intramolecular
Bimolecular
Unimolecular
Enzyme chemistry is largely nucleophilic
Repertoire of Enzyme Catalysis
Enzyme chemistry is largely nucleophilic
Repertoire of Enzyme Catalysis
0
50
100
150
200
250
300
350
400
450
Reaction Types
Num
ber
of
ste
ps in M
ACiE
ProtonProtontransfertransfer
AdAdNN22 E1E1 SSNN22 E2E2 RadicalRadicalreactionreaction
Tautom.Tautom. OthersOthers
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
We do see a few steps corresponding to well-known organic reactions; but these are the exception.
Residue Catalytic Propensities
Residue Catalytic Functions
Phospholipidosis
Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010)
• An adverse effect caused by drugs• Excess accumulation of phospholipids• Often by cationic amphiphilic drugs• Affects many cell types• Causes delay in the drug development
process
Phospholipidosis
Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010)
• Causes delay in the drug development process
• May or may not be related to human pathologies such as Niemann-Pick disease
Hiraoka, M. et al. 2006. Mol. Cell. Biol. 26(16):6139-6148
Electron micrographs of alveolar macrophages (A and B) and peritoneal macrophages (C and D) obtained from 3-month-old Lpla2+/+ and Lpla2-/- mice
Tomizawa et al.,
Literature Mined Dataset
R. Lowe, R.C. Glen, J.B.O. Mitchell Mol. Pharm. 2010 VOL. 7, NO. 5, 1708–1714
• Produced our own dataset of 185 compounds (from literature survey)
• 102 PPL+ and 83PPL-• Each compound is an experimentally
confirmed positive or negative
Some PPL+ molecules, from Reasor et al., Exp Biol Med, 226, 825 (2001)
Represent molecules using descriptors (we used E-Dragon & Circular Fingerprints)
10001101010011001101 10110101000011101101
10111101010001001100 10000001110011100111
10100101011101001110 10011111110001001010
Split data into N folds, then train on (N-2) of them, keeping one for parameter optimisation and one for unseen testing. Average results over all runs (each molecule is predicted once per N-fold validation).
We also repeat the whole process several times with randomly different assignments of which molecules are in which folds.
Experimental Design
Models are built using machine learning techniques such as Random Forest …
… or Support Vector Machine
Average MCC Values:
RF SVM
0.619 0.650
Results
So we have built a good predictive model that can learn the features that predispose a molecule to being PPL+, and can make predictions from chemical structure.
This is useful – one could add it to a virtual screening protocol.
But can we understand anything new about how phospholipidosis occurs?
Read up on gene expression studies related to phospholipidosis …
Sawada et al. listed genes which they found to be up- or down- regulated in phospholipidosis
As with all gene expression experiments, some of these will be highly relevant, others will be noise. Can we help interpret these data?
Mechanism?
H. Sawada, K. Takami, S. Asahi Toxicological Sciences 2005 282-292
What expertise do we have available amongst our team, colleagues & collaborators?
•Multiple target prediction
•Maths
•Programming
Florian Nigsch
Hamse Mussa
Rob Lowe
• Multiple target prediction
Predicting off-target interactions of drugs. Not with the primary pharmaceutical target, but with other targets relevant to side effects.
CHEMBL
Filtered CHEMBL, 241145 compounds & 1923 targets
Data mining and filtering
Random 99:1 split of the whole dataset, 10 repeats
10 models
Phospholipidosis dataset: 100 PPL+, 82 PPL- compounds
Predicted target associations
Target PS scores
ChEMBL Mining
• Mined the ChEMBL (03) database for compounds and targets they interact with
• Target description included the word "enzyme", "cytosolic", "receptor", "agonist" or "ion channel"
• A high cut-off (weak binding) was used on Ki/Kd/IC50 values (< 500μM) to define activity
Method
• Number of Compounds : 241145• Number of Targets : 1923• Split the data into 10 different partitions
of training and validation• Used circular fingerprints with SYBYL atom
types to define similarities between molecules
Multi-class Classification
Algorithms:
• Parzen-Rosenblatt window• Naive Bayes
Parzen-Rosenblatt window
jx
jii KN
xp xx ,1
)|(
using a Gaussian kernel
K(xi, xj) =
22 2
)()(
)(
1
hexp
hji
Tji
d
xxxx
(xi - xj)T(xi - xj) corresponds to the number of features in which xi and xj disagree
• Rank likely targets using estimates of class-condition probabilities
Partition No. PRW Rank NB Rank
1 17.049 74.104
2 16.343 76.251
3 18.424 79.078
4 16.212 73.539
5 17.339 73.535
6 18.630 77.244
7 20.694 78.560
8 18.870 74.464
9 16.584 76.235
10 18.200 78.077
Average 17.835 76.109
When we test the two methods, PRW ranks known targets better than Naïve Bayes does. Hence we use PRW for our study.
Assemble List of Targets Relevant to Sawada’s Suggested Mechanisms
Mechanisms:
1. Inhibition of lysosomal phospholipase activity;
2. Inhibition of lysosomal enzyme transport;
3. Enhanced phospholipid biosynthesis;
4. Enhanced cholesterol biosynthesis.
Assemble List of Targets Relevant to Sawada’s Suggested Mechanisms
Inhibition of lysosomal phospholipase activity
Enhanced phospholipid biosynthesis
Enhanced cholesterol biosynthesis
Assigning Scores to Targets
N
iip xCPS
1
)()(
• Use these 10 models of target interactions• Predict targets for phospholipidosis dataset• Score targets according to the likelihood of
involvement in phospholipidosis• Use the top 100 predicted targets per
compound as we seek off-target interactions
N
iip xCPS
1
)()(
• Score measures tendency of target to interact with PPL+ rather than PPL- compounds.
M1 & M5 are involved in phospholipase C regulation & may be relevant; but not in Sawada’s list.
62
We consider a PS score significant if the target is predicted to interact with at least 50 more PPL+ compounds than PPL- compounds.
Our Scores for 8 of Sawada’s PPL-Relevant Targets
Mechanism Target Rank PS
1 Sphingomyelin phosphodiesterase (SMPD) (h) 225 55
Lysosomal Phospholipase A1 (LYPLA1) (r) 163= 90
Phospholipase A2 (PLA2) (h) 152= 97
3 Elongation of very long chain fatty acids protein 6 (ELOVL6) (h) 1203= -10
Acyl-CoA desaturase (SCD) (m) 610= 0
4 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h) 456= 10
Squalene monooxygenase (SQLE) (h) 437= 14
Lanosterol synthase (LSS) (h) 114= 134
Inhibition of lysosomal phospholipase activity
Enhanced phospholipid biosynthesis
Enhanced cholesterol biosynthesis
Our Scores for Sawada’s PPL-Relevant Targets
Mechanism Target Rank PS
1 Sphingomyelin phosphodiesterase (SMPD) (h) 225 55
Lysosomal Phospholipase A1 (LYPLA1) (r) 163= 90
Phospholipase A2 (PLA2) (h) 152= 97
3Elongation of very long chain fatty acids protein 6 (ELOVL6) (h) 1203= -10
Acyl-CoA desaturase (SCD) (m) 610= 0
4 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h) 456= 10
Squalene monooxygenase (SQLE) (h) 437= 14
Lanosterol synthase (LSS) (h) 114= 134
Inhibition of lysosomal phospholipase activity
Enhanced phospholipid biosynthesis
Enhanced cholesterol biosynthesis
Other Mechanisms• The mechanisms and targets suggested here
are insufficient to explain all the PPL+ compounds in our data set.
• We expect that other targets and possibly mechanisms are important.
• Our method can’t test direct compound – phospholipid binding.
67
ACKNOWLEDGEMENTSACKNOWLEDGEMENTS
Dr Gemma Holliday
Dr Rob Lowe
Dr Daniel Almonacid
Prof. Janet Thornton
Dr Florian Nigsch
Dr Hamse Mussa
Prof. Bobby Glen
Dr Andreas Bender
Alexios Koutsoukas
ACKNOWLEDGEMENTSACKNOWLEDGEMENTS
Cambridge Overseas
Trust