bioinformatics of disease: immune epitope prediction shoba ranganathan professor and chair –...
TRANSCRIPT
Bioinformatics of Disease: immune epitope prediction
Shoba Ranganathan
Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences &Adjunct Professor Biotechnology Research Institute Dept. of BiochemistryMacquarie University Yong Loo Lin School of MedicineSydney, Australia National University of Singapore, Singapore([email protected]) ([email protected])
Visiting scientist @ Institute for Infocomm Research (I2R), Singapore
Bioinformatics is …..
Bioinformatics is the study of living systems through computation
Data in Bioinformatics (in the main)
and their management and analysis
Networks, pathways
and systems
Sequences Genomes Transcriptomes
Databases, ontologies Data & text
miningEvolution andphylogeneticsMaths/StatsAlgorithms Physics/
Chemistry
Genetics and populations
Structures
Overview of my research
1. Genome analysis
2. Transcriptome analysis
3. Protein/Proteome analysis
4. Systems Biology
5. Immunoinformatics
6. Genome-phenome mapping
7. Biodiversity Informatics
5. What is Immunoinformatics?
Using Bioinformatics to address problems in Immunology Application of bioinformatics to
accelerate immune system research has the potential to deliver vaccines and address immunotherapeutics.
Computational systems biology of immune response
Immunoinformatics
Immunology
ComputerScience
Biology
Networks, pathways,
and systems
Maths/StatsDatabases
Artificialintelligence
Algorithms
Cell biology
-omics
Basic immunology
Clinical immunology
IMMUNOINFORMATICS
Physics/Chemistry
Summary
Introduction Structural Immunoinformatic
Database development Data Analysis Computational models Applications
Networks, pathways
and systemsGenetics and populations
-omics
Basic immunology
Clinical immunology
The immune system Composed of many interdependent cell
types, organs, and tissues to protect the body from infections (bacterial,
parasitic, fungal, or viral) and arrest abnormal growth and differentiation
Inappropriate immune responses lead to allergies and autoimmunity
2nd most complex system in the human body
Genomics vs. Immunomics
Genomics: solving the genome puzzle 104 genes coding for 106 products
Immunomics: understanding immune response 102-103 genes leading to >1012 products
Enormous diversity in immunomics has implications for immune function and modulation
It is a numbers game…. >1013 MHC class I haplotypes (IMGT-HLA)
107-1015 T cell receptors (Arstila et al., 1999)
>109 combinatorial antibodies (Jerne, 1993)
1012 B cell clonotypes (Jerne, 1993)
1011 linear epitopes composed of nine amino acids
>>1011 conformational epitopes
T cell mediated adaptive immune response Specific peptide residues critical for stimulating
cellular immune responses Major histocompatibility complex (MHC)
molecules (Human Leukocyte Antigen or HLA in humans) bind and present short antigenic peptides to T cell receptors, for inspection
Antigen presentation is by two classes of MHC (class I and class II)
Those peptides that bind to specific MHC and trigger T cell recognition (T cell epitopes) are targets for vaccine and immunotherapy development
1. Epitope
3. T cell receptor
How to generate a T cell-mediated immune response
2. MHC
Major histocompatibility complex
MHC Class II
Gene structure of the human MHC
MHC Class I
3D structure of the human MHC
MHC Class I for endogenous peptides
Figure by Eric A.J. Reits
MHC class II for exogenous peptides
Figure by Eric A.J. Reits
1. Degradation of antigen
2. Peptide binding to MHC
3. Recognition of peptide-MHC complex by T-cellsYewdell et al. Ann. Rev Immunol (1999)
20% processed
0.5% bind MHC
50% CTL response
0.05% chance of immunogenicity
Antigen processing pathway: peptides, MHC, T-cells
Physico-chemical properties affect MHC-peptide binding
Epitope prediction “Fishing”
Suggest candidate epitopes by in silico screening of entire proteins and even proteomes with specificity at:the allele levelthe supertype leveldisease-implicated alleles alone.
Minimize the number of wet-lab experiments Cut down the lead time involved in epitope
discovery and vaccine design
Computational models can help identify T cell epitopes
1. Sequence-based approach Pattern recognition techniques
• binding motif, matrices, ANN, HMM, SVM Main limitations:
• Require large amount of data for training• Preclude data with limited sequence conservation
2. Structure-based approach Rigid backbone modeling techniques Flexible docking techniques Main advantage: large training datasets unnecessary
Predicting MHC-binding peptidesTong, Tan and Ranganathan (2007) Briefings in Bioinformatics 8: 96-108
Our aim: Structure-based prediction
of MHC-binding peptides
Great potential to: generate biologically meaningful data for analysis predict candidate peptides for alleles that have not
been widely studied, where sequence-based approaches fail or are not attempted
predict binding affinity of peptides predict non-contiguous epitopes
Structure determination through experimental methods is both expensive and time-consuming
Has not been extensively studied due to high computational costs and development complexity
Why structure?
Protein Threading [Altuvia et al. 1995; Schueler-Furman et al. 2000]
Homology Modeling [Michielin et al. 2000] Rigid/Flexible Docking [Rosenfeld et al. 1993;
Sezerman et al. 1996; Rognan et al. 1999; Desmet et al. 2000; Michielin et al. 2003]
Existing Structure-based Prediction Techniques
Hypothesis for epitope selection
Peptides bound to MHC alleles are similar to substrates bound to enzymes
“Lock-and-key” mechanism for peptide selection Shape Size Electrostatic characteristics
Introduction Structural Immunoinformatic
Database development Data Analysis Computational models Applications
Sequences
Databases, ontologies
Basic immunology
Genetics and populations
Structures
MPID:MHC-Peptide Interaction Database Govindarajan et al. (2003) Bioinformatics, 19: 309-310RDB of 82 curated pMHC complexes (Class I: 64 & Class II:18)
Distribution based on MHC allele specificity
0
5
10
15
20
25
A*0
20
1
A*6
80
1
B*0
80
1
B*2
70
5
B*3
50
1
B*5
10
1
B*5
30
1
DQ
8
DR
1
DR
2
DR
3
DR
4
H2
-Db
H2
-Dd
H2
-Kb
H2
-Ld
HL
A-C
w3
HL
A-C
w4
I-A
d
I-A
k
RT
1.A
a
MHC allele
Gap index =
Peptide/MHC interaction characteristics
Gap Volume
Intermolecular hydrogen bonds
Interface area
Gap volumeInterface area
Interacting Residues
Peptide Length
MPID-T: MHC-Peptide-T Cell Receptor Interaction Database Tong et al. (2006) Applied Bioinformatics, 5: 111-114
187 curated pMHC 16 with TCR Human:110, Murine:74 and Rat:3 Alleles: 40
(interface area, H bonds, gap volume and gap index)
101 new entries 187 entries (Human: 110; Murine: 74; Rat: 3) 134 non-redundant entries (class I: 100; class II: 34) 121 class I and 41 class II entries 26 HLA alleles (class I: 18; class II: 8) 14 rodent alleles (class I: 8; class II: 6) 16 TCR/peptide/MHC complexes
Distribution of MHC by allele
Peptide/MHC binding motifs
Conserved peptide properties in solution structures Classified according to
Alleles Peptide length
Polar Amide Basic Acidic Hydrophobic
1. There were only 36 crystal structures of unique
MHC (2006) alleles vs. 1765 unique MHC alleles
identified in IMGT/HLA database
2. Structure determination through experimental
methods is both expensive and time-consuming
3. Homology model building for alleles with no
structural data!
How to obtain structures of experimentally unsolved alleles?
Introduction Structural Immunoinformatic
Database development Data Analysis of pMHC Class I
complexes Computational models Applications
Data & text mining
Maths/Stats
Structures
MHC Class I superfamilies have different interaction characteristics
Superfamily HLA-A2 (36 entries)
HLA-B7(12 entries)
HLA-B27(18 entries)
Interface area (Å2) 846.3±48.9 876.7±72.4 934.0±136.0
Gap volume (Å3) 799.8±195.2 870.2±198.0 985.1±101.5
Gap index 0.9±0.2 1.0±0.1 1.0±0.3
Hydrogen bonds 11.1±1.9Concentrated at pockets A, B, F
14.3±2.3Well distributed
17.9±2.8Concentrated at pockets A, B, F
Single linkage cluster analysis of 68 pMHC Class I complexes from 13 alleles (all available A and B)
Data 68 peptide–HLA complexes spanning 13 classes I alleles from MPID-T
Hierarchical clustering Hierarchical clustering using the agglomerative algorithm. Distance between structures computed by single-linkage method (MATLAB
version 7.0) based on the separation between the each pair of data points. Nearest neighbors merged into clusters. Smaller clusters were then merged into larger clusters based on inter-cluster
distances, until all structures are combined. Last 3 levels considered for defining HLA class I supertypes.
Interaction parameters Significant for the characterization of peptide/MHC interface:
Intermolecular hydrogen bonds pMHC Interface area
Binding characteristics of HLA supertypes analyzed
Details
Gap volumeGap index
B27
B44
B7
B62
B8
Legend
Do the Class I alleles aggregate into “superfamilies” using receptor-ligand interaction patterns?
80 HLA class I complexes 13 class I alleles Five descriptors Hierarchical clustering using
nearest neighbor algorithm 77% consensus with data
from other groups
Supertype definition: receptor structure, ligand binding motifs, or receptor-ligand interaction patterns
MHC Class I superfamilies from receptor-ligand interactions
B27 B44 B7 B62 B8
Legend
Tong, Tan and Ranganathan (2007) Bioinformatics, 23: 177-183
Introduction Structural Immunoinformatic
Database development Data Analysis Computational models Applications
Maths/Stats
StructuresSequences
Physics/Chemistry
1. Finding the best fit conformation (docking) of peptides within the MHC binding groove
2. Screening potential binders from the background
Two-step approach to predict MHC-binding peptides
Docking is a computationally exhaustive procedure Large number of possible peptide conformations
3 global translational degrees of freedom 3 global rotational degrees of freedom 1 conformational degree of freedom for each rotatable bond
y
x
z R
N C C
C
O
>1010 possible conformations for a 10-residue peptide
Class I peptides N-termini residues
0.02 – 0.29 Å C-termini residues
0.00 – 0.25 Å
Class II binding registers Only 9 residues fit in
the binding groove N-termini residues
0.01 – 0.22 Å C-termini residues
0.02 – 0.27 Å
Conservation of nonamer peptide backbone conformation
Rapid docking of peptide to MHC Tong, Tan & Ranganathan (2004) Protein Sci. 13:2523-2532
Anchoring root fragments to reduce search space (Pseudo-Brownian rigid body docking )
Loop modeling (Loop closure of central backbone by satisfaction of spatial restraints)
Ligand backbone and
side-chain refinement (entire backbone and interacting side-chains
2
3
1
Benchmarking with existing techniques
Author Technique Peptide RMSDa RMSDb
Rognan et al. Simulated Annealing
TLTSCNTSV 1.04 0.46
FLPSDFFPSV 1.59 1.10
GILGFVFTL 0.46 0.32
ILKEPVHGV 0.87 0.87
LLFGYPVYV 0.78 0.33
Desmet et al. Combinatorial Buildup Algorithm RGYVYQGL 0.56 0.32
Rosenfeld et al. Multiple Copy AlgorithmFAPGNYPAL 2.70 0.40
GILGFVFTL 1.40 0.32
Sezerman et al. Combinatorial Buildup Algorithm
LLFGYPVYV 1.40 0.33
ILKGPVHGV 1.30 0.87
GILGFVFTL 1.60 0.32
TLTSCNTSV 2.20 0.46
aRMSD of peptide backbone obtained from respective authors. bRMSD of peptide backbone obtained in our work from redocking bound complexes and single template respectively.
Quantitative separation of binders from non-binders: empirical free energy scoring function DQ3.2involved in several autoimmune
diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome
type II
Gbind = αGH + βGS + GEL + C
Gbind = binding free energy GH = hydrophobic term GS = decrease in side chain entropy GEL = electrostatic term C = entropy change in system due to external
factors α, β, γ optimized by least-square multivariate regression
with experimental binding affinities (IC50) of MHC-peptides in training dataset (Rognan et al., 1999)
Quantitative separation of binders from non-binders: empirical free energy scoring function
Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).
Test case: MHC Class II DQ8
DQ3.2(DQA1*0301/DQB1*0302)is involved in several autoimmune diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome
type II
Data used Structure: 1JK8 - DQ3.2β–insulin B9-23 complex Dataset I: 127 peptides with experimentally determined
IC50 values [70 high-affinity (IC50 < 500 nM), 13 medium-affinity (500 nM < IC50 < 1500 nM )and 23 low-affinity (1500 < IC50 < 5000 nM) binders and 21 non-binders (5000 < IC50)] derived from biochemical studies. 87 with known binding registers.
Dataset II: 12 Dermatophagoides pternnyssinus (Der p 2) peptides with experimental T-cell proliferation values from functional studies, with 7 peptides eliciting DQ3.2β-restricted T-cell proliferation.
Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).
Training 56 binding conformations with known registers 30 non-binding conformations from 3 non-
binders Testing
Test set 1 – 68 peptides from biochemical studies
16 strong ; 13 medium; 21 weak; 18 non-binders
Test set 2 – 12 peptides from functional studies 7 elicit T-cell proliferation
Scoring: Training & testing datasets
Y Q T I E E N I K I F E E D A
E285B 112-126 peptide
Core sequence Binding Energy
YQTIEENIK -23.12
QTIEENIKI -21.34
TIEENIKIF -25.32
IEENIKIFE -29.53
EENIKIFEE -32.27
ENIKIFEED -21.72
NIKIFEEDA -22.95
Screening class II binding register: a sliding window approach
Docking
Anchoring root fragments (probes) to reduce search space
Loop modeling
Refinement of binding
register
Extension of flanking
residues for MHC Class II
A
B
C
D
4-step protocol used
Sensitivity (SE) = number of binders correctly predicted
= TP/AP (TP+FN) Specificity (SP) = number of non-binders correctly predicted
= TN/AN (TN+FP)
Accuracy estimates
Area under ROC (receiver operating characteristics) curve:
>90% excellent
>80% good
Results for Training set
High SE (good for most predictions)
Very few FPs, but also fewer predictions
Group LMH MH H
AROC 0.88 0.93 0.93
Screening class II binding register: HLA-DQ8 prediction accuracy for Test Set I
Classification of binding peptides High-affinity binders (H)
IC50 ≤ 500 nM
Medium-affinity binders (M) 500 nM < IC50 ≤ 1500 nM
Low-affinity binders (L) 1500 < IC50 ≤ 5000 nM
Position 1 4 6 7 9
Source BE (kJ/mol)
IC50 (nM)
Binding Motif
T D R R Q S V V V N W M D D G K A A A D E I I I P D Y Y R Q E F L M
L Q L Q P F P Q P Q P F P P L A-gliadin 56-70 -41.01 20 D M T P A D A L D D F D L HSV -40.53 173 A A A A A V A A E A Y Artificial sequence -39.98 48 G V A G L L V A L A V IA-2 499-509 -36.16 95 D S N I M N S I N N V M D E I D F F E K Pf ABRA 487–506 -36.01 171 F E S T G N L I A P E Y G F K I S Y HA 255–271Y -35.70 62 Y P F I E Q E G P E F F D Q E MHC Ia 51–63 analog -35.34 1156 L L D I L D T A G L E E Y S A M R D p21 51–66; C out -35.27 202 Q P Y P Q P Q P F P S Q Q P Y A-gliadin 41-55 -35.26 1120 F P S Q Q P Y L Q L Q P F P Q A-gliadin 49-63 -33.93 20 C D G E R P T L A F L Q D V M GAD 101–115 -33.57 69 S F P P Q Q P Y P Q P Q P Q Y A-gliadin 77-91 -33.35 370 S Q D L E L S W N L N G L Q A D L S S FceR 104–122 -32.89 123 E P R A P W I E Q E G P E Y W MHC Ia 46-63 -32.89 519 P P L Y A T G R L S Q A Q L M P S P P M VP16 -32.59 538 S Q D L E L S W N L N G L Q A Y FceR 104–122 analog -32.49 118
Ligands / Epitopes
I A R A K M F P A V A E K 34P3A -31.91 541
Test Set 1: Improved detection of binders
lacking position specific binding motifs
Binding registers 20/23 (87%) binding registers Only register (aa 4-12) from Test Set 2
(Der p 2: 1-20)
(SE=0.80; SP(LMH)=0.90)
Top 5 predictions are experimental positives at very stringent threshold criteria (SE=0.95; SP(H)=0.63)
T-cell proliferation
Multiple registers (SP=0.95, SE(LMHP =0.81): 58% of Test Set 1)
0123456789
1011121314
1 2 3 4 5 6 7
No of Binding Registers
No
of
Pep
tid
es
Weak Binders Medium Binders Strong Binders
Mainly for medium and high binders
Experimental support: Sinha et al. for DRB1*0402
Is this why binding motifs are unsuccessful?
Introduction Structural Immunoinformatic Database
development Data Analysis Computational models developed Applications
Autoimmune blistering skin disorder Characterized by autoantibodies targeting
desmoglein-3 (Dsg3) Strong association with DR4 and DR6 alleles
Pemphigus vulgaris (PV)
http://www.medscape.com
adam.about.com
www.aafp.org
Who are the major players in PV? DR4 PV implicated alleles (for Semitic)
DRB1*0401 DRB1*0402 DRB1*0404 DRB1*0406
DR6 PV implicated alleles (for Caucasians) DRB1*1401 DRB1*1404 DRB1*1405 DQB1*0503
DR4 PV implicated alleles (DRB1*0401, *0402, *0404, *0406)
High sequence conservation 97.9 – 99.0% identity 98.4 – 99.5% similarity
High structural conservation Cα RMSD <0.22 Å for all key binding pockets
7 polymorphic residues within binding cleft Pocket 1 (β86), Pocket 4 (β70, 71, 74) Pocket 6 (β11) Pocket 7 (β71) Pocket 9 (β37)
What is known about DR4?
DR6 PV implicated alleles (DRB1*1401, *1404, *1405, DQB1*0503)
High sequence conservation 85.8 – 94.1% identity 83.2 – 97.3% similarity
High structural conservation Cα RMSD <0.22 Å for all key binding pockets
14 polymorphic residues within binding clefts Pocket 1 (β86) Pocket 4 (β13, 70, 71, 74, 78) Pocket 6 (β11) Pocket 7 (β28, 30, 67, 71) Pocket 9 (β9, 37, 57, 60)
What is known about DR6?
9 stimulatory Dsg3 peptides tested on PV patients possessing DR4 and DR6 PV implicated alleles
1. Dsg3 96-112 (DR4, DR6)2. Dsg3 191-205 (DR4, DR6)3. Dsg3 206-220 (DR4, DR6)4. Dsg3 252-266 (DR4, DR6)5. Dsg3 342-356 (DR4, DR6)6. Dsg3 380-394 (DR4, DR6)7. Dsg3 763-777 (DR4, DR6)8. Dsg3 810-824 (DR4)9. Dsg3 963-977 (DR4)
Clues…
DR4 PV 8/9 investigated Dsg3 peptides fit perfectly into DRB1*0402 Atomic clashes with all other investigated DR4 subtypes
DR6 PV 6/9 investigated Dsg3 peptides fit perfectly into DRB1*0503 Atomic clashes with all other investigated DR6 subtypes
HLA association in DR6 PV more likely to be at DQ than DR locus
Consistent with experimental work done by Sinha et al. (2002, 2005, 2006)
Disease associated alleles vs. innocent bystanders
Tong et al. (2006) Immunome Research, 2: 1
1/9 investigated Dsg3 peptides fits existing binding motifs Flanking residues – clashes in fitting binding register Register-shift for Peptide V (Dsg3 342-356)
Detected binding register: Dsg3 346-354 Binding motifs: Dsg3 347-355 (Veldman et al., 2003)
: Dsg3 345-353 (Sinha et al., 2006)
Whither sequence motifs (again!)?
Docking of 936 15mer Dsg3 peptides generated using a sliding window of size 15 across the entire Dsg3 glycoprotein
Large-scale screening of Dsg3 peptides
Dsg3 peptide (sliding window width 15)
N C
Binding register (sliding window width 9)
Flanking residues
Tong et al. (2006) BMC Bioinformatics, 7(Suppl 5): S7
Training set: 8 peptides each, with exp. IC50 values and known binding registers (5 binders and 3 non-binders)
-40.00
-35.00
-30.00
-25.00
-20.00
-15.00
50 70 90 110 130 150 170 190 210 230 250
15-mer start position
Bin
din
g E
ne
rgy
-40.00
-35.00
-30.00
-25.00
-20.00
-15.00
250 270 290 310 330 350 370 390 410 430 450
15-mer start position
Bin
din
g E
nerg
y
-40.00
-35.00
-30.00
-25.00
-20.00
-15.00
450 470 490 510 530 550 570 590 610 630 650
15-mer start position
Bin
din
g E
nerg
y
-40.00
-35.00
-30.00
-25.00
-20.00
-15.00
650 670 690 710 730 750 770 790 810 830 850
15-mer start position
Bin
din
g E
nerg
y
-40.00
-35.00
-30.00
-25.00
-20.00
-15.00
850 870 890 910 930 950 970 990 1010 1030 1050
15-mer start position
Bin
din
g E
nerg
y
-40.00
-35.00
-30.00
-25.00
-20.00
-15.00
450 470 490 510 530 550 570 590 610 630 650
15-mer start position
Binding
Energy
-40.00
-35.00
-30.00
-25.00
-20.00
-15.00
450 470 490 510 530 550 570 590 610 630 650
15-mer start position
Binding Energy
Extracellular
Intracellular
Transmembrane
DQB1*0503
DRB1*0402
Immunoreactive region
Large-scale screening of Dsg3 peptides
Common epitopes possibly responsible for inducing disease in DR4 & DR6 patients
Significant level of cross reactivity observed between DRB1*0402 and DQB1*0503 ( AROC=0.93)
57% of peptides investigated in this study predicted to bind to both alleles with high affinity
90% of known Dsg3 peptides predicted to bind to both alleles
12/20 top predicted DQB1*0503-specific Dsg3 peptides from transmembrane region
All top predicted DQB1*0402-specific Dsg3 peptides from extracellular regions
Disease initiation implications: DR4 from ECD; DR6 from TM
Multiple binding registers revisited 76% (410/539) predicted high-affinity binders to DRB1*0402
possess > 2 binding registers 57% (384/673) predicted high-affinity binders to DQB1*0503
possess > 2 binding registers 66% (354/539) bind both alleles at different registers Similar proportion (70%) detected in known binders to both
alleles
Both alleles bind similar peptides via different binding registers
0
50
100
150
200
250
300
350
0 1 2 3 4 5 6
No of Binding Registers
No
of
Pep
tid
es
DQB1*0503 DRB1*0402
What next?
We have developed a predictive model for HLA-C (Cw*0401) with very limited (only six) experimental binding values.
The model yields excellent results for test data (AROC=0.93).
Application to determine immunological hot spots for HIV-1 p24gag and gp160gag glycoproteins shows binding energies similar to HLA-A and –B.
Conclusions
Computational models for immunogenic epitope prediction can be successfully developed, even for alleles with limited experimental data.
While computations can never completely replace “wet-lab” experiments, in silico predictions can significantly cut down the development time of therapeutic vaccines.
1. Genome analysis
Approaches EST analysis Annotation pipeline
using workflow strategies
Applications Parasitic nematodes Cancer EST data
Outcomes Comprehensive
annotation at the gene and protein levels
Novel &/or pathogen-specific genes
Immune response evasion strategies
2. Transcriptome analysis
Approaches Graph formalism for
alternative splicing Genome-wide analysis
Applications Drosophila genome Chicken compared to
human and mouse Kallikrein variants as
markers
Outcomes New mRNA-gDNA alignment
method, MGAlign & MGAlignIt First splicing graph database,
DEDB Web server for splicing
graphs, ASGS Sub-graph elements for
alternative splicing Multi-species splicing graph
database, GraphDB
3. Protein/Proteome research:Origin and evolution of structural domainsApproaches Intron mapping to
domain boundary All eukaryotic proteins
analyzed
Applications Domain prediction in
EST/genome data Effect of splice
variants on domains
Outcomes New database of protein
coding genes, XPro Visualization of intronic
locations on protein structural doimains, XDomView
Analysis tool, Go Module Viewer
3. Protein/Proteome research: Small disulfide-rich proteins<100 aa per domain; ≥ 2 SS bonds
Approaches Multiple structure
alignment and hierarchical classification
Comparative modeling rules
Sequence, structure and evolutionary analysis of Potato II inhibitor family
Outcomes New database, DSFD Server for model building,
SDPMOD Understanding of wound-
induced protease inhibitor folding
Applications Design of protease
inhibitors, channel modulators, growth regulators
3. Protein/Proteome research: Protease cleavage site predictionApproaches Detailed structural
modeling and docking of signal peptide moiety to signal peptidase I
SVM for caspases
Applications Enhanced production of
therapeutic and cemmercial heterologous proteins
Apoptosis initiation
Outcomes New databases, SPdb,
CasBase Server for caspase
clevage prediction, CASVM
Signal peptide cleavage prediction (under development)
4. Systems BiologyApproaches Holistic computational,
molecular biology and FRET study to locate secretion roadblocks
EST analysis of host-parasite interactions
Applications Trichoderma reesei as fungal
bioreactor Parasites that lead to: liver
cancer - food borne trematode (Opisthorchis viverrini) and bladder cancer (Schistosoma haematobium).
Outcomes Improved heterologous
protein production using filamentous fungi
Understanding of how parasites evade host immune activation
6. Genome-Phenome mapping
Approaches Mutation data for non-
laboratory animals Mapping to OMIM Mapping to structure
Applications OMIA-OMIM mapping
to structure Correlation between
genotype and disease pehnotype
Outcomes OMIA database, with links
to OMIM (courtesy NCBI) Mutations linked to
severity of disease for α-D-mannosidosis
Predictions of new human disease mutations from known mutation sites in cow, cat and guinea pig
7. Biodiversity Informatics: Customary medicinal plantsApproaches Integrating, visualizing and
analyzing ethnobotanical, phytochemical and pharmacological data on customary medicinal plants
Data from Australian aboriginal elders and Indian Siddha doctors
Applications Novel antimicrobial, anti-
inflammatory and anti-cancer lead compunds
Outcomes CMkb, an integrated
knowledgebase
DedicationsProf. Bernard Pullman
Mme. Alberte Pullman
My brother, a CML survivor
Acknowledgements
Dr. (Victor) J.C. Tong, NUS&I2R, Singapore A/Prof. Tin Wee Tan, NUS Dr. Animesh Sinha, Weill Medical College of
Cornell University & Michigan State University, USA
Drs. J. Tom August (JHU) and Vladimir Brusic (DFCI) (NIAID-NIH Grant #5 U19 AI56541 & Contract #HHSN266200400085C).
All of you!