05/02/2008 jae hyun kim genome scale enzyme-metabolite and drug-target interaction predictions using...

Post on 19-Jan-2016

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

05/02/2008

Jae Hyun Kim

Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor

Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33.

Terminology Motivation Method

Molecular Signature Signature Kernel Signature Product Kernel

Results Conclusion

2

Contents

jaekim@ku.edu

Catalyst Increases the rate of chemical reaction / biological process Remains unchanged

Enzyme Biomolecules that catalyze chemical reactions Usually proteins

Metabolite Intermediates & products of metabolism Restricted to small molecules

3

Terminology (1)

jaekim@ku.edu

Reference:www.wikipedia.org

Inhibitor Molecules that decrease enzyme activity Compete with substrates Most of drugs/poisons

4

Terminology (2)

jaekim@ku.edu

Reference:www.wikipedia.org

EC Number Numerical Classification scheme for Enzyme-

catalyzed reactions Four levels of hierarchy Example: EC 3.4.11.4 : tripeptide aminopeptidases EC 3 : hydrolases (enzymes that use water to break

up some other molecules ) EC 3.4 : hydrolases that act on peptide bonds EC 3.4.11 : hydrolases that cleave off the amino-

terminal amino acid from polypeptide EC 3.4.11.4 : hydrolases that cleave off the amino-

terminal end from a tripeptide5

Enzyme Commission (EC) Number

jaekim@ku.edu

Reference:www.wikipedia.org

Genome scale

enzyme-metabolite and drug-target interaction

predictions

using the signature molecular descriptor

6

Motivation

jaekim@ku.edu

Protein-Chemical Interaction

Large-scale

Machine-learning Technique

G=(V,E) : Molecular Graph V : vertex (atom) set E : edge (bond) set

Atomic Signature Canonical representation of subgraph surrounding a

particular atom include atoms and bonds up to a predefined distance

(height) Molecular Signature of G : h(G)

hG(x) : atomic signature in G rooted at x of height h Height

Chemicals : 0~6 Protein: 6~18 (amino acid residue 1~7)

7

Molecular Signature

jaekim@ku.edu

Molecular Signature: Example

8jaekim@ku.edu

(Leucine) (Isoleucine) (Glycine)

•Depth First Search up to “height” deep•‘(‘ going down, ‘)’ going back up

c_, n_: sp3 carbon/nitrogen atomc=, o= : sp2 (double-bond) carbon/oxygen atomh_: hydrogen

General form of enzymatic reaction R s1S1+s2S2+…+snSn p1P1+p2P2+…+pmPm

Height h signature of reaction R

9

Reaction Signature

jaekim@ku.edu

To predict/classify protein-protein interactions To measure similarity between two pairs of

proteins Kernel Function K( (X1,X2), (X’1,X’2) )

How to measure similarity between pairs?

10

Pairwise Kernel

jaekim@ku.edu

Pairwise similarity by component similarity If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’)

Assess directly similarity between pairs x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2)

Similarity inside the pair Similarity between pairs

11

Kernel Types

jaekim@ku.edu

FromBen-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46.

Definition

Apply to chemicals, proteins, reactions

12

Signature Kernel

jaekim@ku.edu

P: Protein, C: Chemical

Definition : Signature of Complex PC

Two pairs of P-C interaction (P,C) & (Q,D)

13

Signature Product Kernel (1/2)

jaekim@ku.edu

Similarly,

Therefore,

14

Signature Product Kernel (2/2)

jaekim@ku.edu

Signature Kernel : Example (height 1)

15jaekim@ku.edu

# of occurrence

Signature Product Kernel : Example

16jaekim@ku.edu

Signature Similarity VS. Sequence Alignment Scores

17jaekim@ku.edu

• Computed for every pair of amino acids• Correlation : Chemically similar high BLOSUM62 score

Positive Examples download from KEGG more than 50, max 500

Negative Examples: Equal Number, Random Selection

Signature Kernel, 5-fold CV

18

EC Number Classification

jaekim@ku.edu

Using only reactionsUsing only protein sequences

EC Classification

19jaekim@ku.edu

Class 1 Class 1.1

Class 1.1.1 Class 1.1.1.1

•Using both sequences & reactions•Signature Product Kernel

Comparison with other Methods

20jaekim@ku.edu

•Accuracy = (TP+TN)/ (TP+TN+FP+FN)•Auc = Area Under Curve•Precision = TP/(TP+FP)•Sensitivity=TP/(TP+FN)•Specificity=TN/(TN+FP)•Jaccard Coefficient = TP/(TP+FP+FN)

• A larger number indicates better results

Prediction EC No. accepted in September 2006 : Test Set Predict whether or not a given enzyme will catalyze a

given reaction Signature Product Kernel

21

Predicting New Enzyme Interactions

jaekim@ku.edu

Predict DRUGBANK Using KEGG

22jaekim@ku.edu

Area under ROC = 0.74

•Signature Product Kernel

•Class I : Both in training set•Class II: Different Partners•Class III: Only Target•Class IV: Only Drug•Class V: None

Unified method for predicting protein-chemical interactions

Atomistic structure representation of proteins encompasses information stored in substitution matrices.

23

Conclusion

jaekim@ku.edu

top related