05/02/2008 jae hyun kim genome scale enzyme-metabolite and drug-target interaction predictions using...
Post on 19-Jan-2016
215 Views
Preview:
TRANSCRIPT
05/02/2008
Jae Hyun Kim
Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor
Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33.
Terminology Motivation Method
Molecular Signature Signature Kernel Signature Product Kernel
Results Conclusion
2
Contents
jaekim@ku.edu
Catalyst Increases the rate of chemical reaction / biological process Remains unchanged
Enzyme Biomolecules that catalyze chemical reactions Usually proteins
Metabolite Intermediates & products of metabolism Restricted to small molecules
3
Terminology (1)
jaekim@ku.edu
Reference:www.wikipedia.org
Inhibitor Molecules that decrease enzyme activity Compete with substrates Most of drugs/poisons
4
Terminology (2)
jaekim@ku.edu
Reference:www.wikipedia.org
EC Number Numerical Classification scheme for Enzyme-
catalyzed reactions Four levels of hierarchy Example: EC 3.4.11.4 : tripeptide aminopeptidases EC 3 : hydrolases (enzymes that use water to break
up some other molecules ) EC 3.4 : hydrolases that act on peptide bonds EC 3.4.11 : hydrolases that cleave off the amino-
terminal amino acid from polypeptide EC 3.4.11.4 : hydrolases that cleave off the amino-
terminal end from a tripeptide5
Enzyme Commission (EC) Number
jaekim@ku.edu
Reference:www.wikipedia.org
Genome scale
enzyme-metabolite and drug-target interaction
predictions
using the signature molecular descriptor
6
Motivation
jaekim@ku.edu
Protein-Chemical Interaction
Large-scale
Machine-learning Technique
G=(V,E) : Molecular Graph V : vertex (atom) set E : edge (bond) set
Atomic Signature Canonical representation of subgraph surrounding a
particular atom include atoms and bonds up to a predefined distance
(height) Molecular Signature of G : h(G)
hG(x) : atomic signature in G rooted at x of height h Height
Chemicals : 0~6 Protein: 6~18 (amino acid residue 1~7)
7
Molecular Signature
jaekim@ku.edu
Molecular Signature: Example
8jaekim@ku.edu
(Leucine) (Isoleucine) (Glycine)
•Depth First Search up to “height” deep•‘(‘ going down, ‘)’ going back up
c_, n_: sp3 carbon/nitrogen atomc=, o= : sp2 (double-bond) carbon/oxygen atomh_: hydrogen
General form of enzymatic reaction R s1S1+s2S2+…+snSn p1P1+p2P2+…+pmPm
Height h signature of reaction R
9
Reaction Signature
jaekim@ku.edu
To predict/classify protein-protein interactions To measure similarity between two pairs of
proteins Kernel Function K( (X1,X2), (X’1,X’2) )
How to measure similarity between pairs?
10
Pairwise Kernel
jaekim@ku.edu
Pairwise similarity by component similarity If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’)
Assess directly similarity between pairs x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2)
Similarity inside the pair Similarity between pairs
11
Kernel Types
jaekim@ku.edu
FromBen-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46.
Definition
Apply to chemicals, proteins, reactions
12
Signature Kernel
jaekim@ku.edu
P: Protein, C: Chemical
Definition : Signature of Complex PC
Two pairs of P-C interaction (P,C) & (Q,D)
13
Signature Product Kernel (1/2)
jaekim@ku.edu
Similarly,
Therefore,
14
Signature Product Kernel (2/2)
jaekim@ku.edu
Signature Kernel : Example (height 1)
15jaekim@ku.edu
# of occurrence
Signature Product Kernel : Example
16jaekim@ku.edu
Signature Similarity VS. Sequence Alignment Scores
17jaekim@ku.edu
• Computed for every pair of amino acids• Correlation : Chemically similar high BLOSUM62 score
Positive Examples download from KEGG more than 50, max 500
Negative Examples: Equal Number, Random Selection
Signature Kernel, 5-fold CV
18
EC Number Classification
jaekim@ku.edu
Using only reactionsUsing only protein sequences
EC Classification
19jaekim@ku.edu
Class 1 Class 1.1
Class 1.1.1 Class 1.1.1.1
•Using both sequences & reactions•Signature Product Kernel
Comparison with other Methods
20jaekim@ku.edu
•Accuracy = (TP+TN)/ (TP+TN+FP+FN)•Auc = Area Under Curve•Precision = TP/(TP+FP)•Sensitivity=TP/(TP+FN)•Specificity=TN/(TN+FP)•Jaccard Coefficient = TP/(TP+FP+FN)
• A larger number indicates better results
Prediction EC No. accepted in September 2006 : Test Set Predict whether or not a given enzyme will catalyze a
given reaction Signature Product Kernel
21
Predicting New Enzyme Interactions
jaekim@ku.edu
Predict DRUGBANK Using KEGG
22jaekim@ku.edu
Area under ROC = 0.74
•Signature Product Kernel
•Class I : Both in training set•Class II: Different Partners•Class III: Only Target•Class IV: Only Drug•Class V: None
Unified method for predicting protein-chemical interactions
Atomistic structure representation of proteins encompasses information stored in substitution matrices.
23
Conclusion
jaekim@ku.edu
top related