05/02/2008 jae hyun kim genome scale enzyme-metabolite and drug-target interaction predictions using...
TRANSCRIPT
05/02/2008
Jae Hyun Kim
Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor
Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33.
Terminology Motivation Method
Molecular Signature Signature Kernel Signature Product Kernel
Results Conclusion
2
Contents
Catalyst Increases the rate of chemical reaction / biological process Remains unchanged
Enzyme Biomolecules that catalyze chemical reactions Usually proteins
Metabolite Intermediates & products of metabolism Restricted to small molecules
3
Terminology (1)
Reference:www.wikipedia.org
Inhibitor Molecules that decrease enzyme activity Compete with substrates Most of drugs/poisons
4
Terminology (2)
Reference:www.wikipedia.org
EC Number Numerical Classification scheme for Enzyme-
catalyzed reactions Four levels of hierarchy Example: EC 3.4.11.4 : tripeptide aminopeptidases EC 3 : hydrolases (enzymes that use water to break
up some other molecules ) EC 3.4 : hydrolases that act on peptide bonds EC 3.4.11 : hydrolases that cleave off the amino-
terminal amino acid from polypeptide EC 3.4.11.4 : hydrolases that cleave off the amino-
terminal end from a tripeptide5
Enzyme Commission (EC) Number
Reference:www.wikipedia.org
Genome scale
enzyme-metabolite and drug-target interaction
predictions
using the signature molecular descriptor
6
Motivation
Protein-Chemical Interaction
Large-scale
Machine-learning Technique
G=(V,E) : Molecular Graph V : vertex (atom) set E : edge (bond) set
Atomic Signature Canonical representation of subgraph surrounding a
particular atom include atoms and bonds up to a predefined distance
(height) Molecular Signature of G : h(G)
hG(x) : atomic signature in G rooted at x of height h Height
Chemicals : 0~6 Protein: 6~18 (amino acid residue 1~7)
7
Molecular Signature
Molecular Signature: Example
(Leucine) (Isoleucine) (Glycine)
•Depth First Search up to “height” deep•‘(‘ going down, ‘)’ going back up
c_, n_: sp3 carbon/nitrogen atomc=, o= : sp2 (double-bond) carbon/oxygen atomh_: hydrogen
General form of enzymatic reaction R s1S1+s2S2+…+snSn p1P1+p2P2+…+pmPm
Height h signature of reaction R
9
Reaction Signature
To predict/classify protein-protein interactions To measure similarity between two pairs of
proteins Kernel Function K( (X1,X2), (X’1,X’2) )
How to measure similarity between pairs?
10
Pairwise Kernel
Pairwise similarity by component similarity If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’)
Assess directly similarity between pairs x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2)
Similarity inside the pair Similarity between pairs
11
Kernel Types
FromBen-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46.
P: Protein, C: Chemical
Definition : Signature of Complex PC
Two pairs of P-C interaction (P,C) & (Q,D)
13
Signature Product Kernel (1/2)
Signature Product Kernel : Example
Signature Similarity VS. Sequence Alignment Scores
• Computed for every pair of amino acids• Correlation : Chemically similar high BLOSUM62 score
Positive Examples download from KEGG more than 50, max 500
Negative Examples: Equal Number, Random Selection
Signature Kernel, 5-fold CV
18
EC Number Classification
Using only reactionsUsing only protein sequences
EC Classification
Class 1 Class 1.1
Class 1.1.1 Class 1.1.1.1
•Using both sequences & reactions•Signature Product Kernel
Comparison with other Methods
•Accuracy = (TP+TN)/ (TP+TN+FP+FN)•Auc = Area Under Curve•Precision = TP/(TP+FP)•Sensitivity=TP/(TP+FN)•Specificity=TN/(TN+FP)•Jaccard Coefficient = TP/(TP+FP+FN)
• A larger number indicates better results
Prediction EC No. accepted in September 2006 : Test Set Predict whether or not a given enzyme will catalyze a
given reaction Signature Product Kernel
21
Predicting New Enzyme Interactions
Predict DRUGBANK Using KEGG
Area under ROC = 0.74
•Signature Product Kernel
•Class I : Both in training set•Class II: Different Partners•Class III: Only Target•Class IV: Only Drug•Class V: None
Unified method for predicting protein-chemical interactions
Atomistic structure representation of proteins encompasses information stored in substitution matrices.
23
Conclusion