poster

1
3D Virtual Screening of PknB Inhibitors using data fusion methods Abhik Seal 1 , Perumal Yogeeswari 2 , Dharmaranjan Sriram 2 , David J Wild 1 ,OSDD Consortium 3 1 School of Informatics and Computing Indiana University Bloomington USA, 2 Department of Pharmacy Birla Institute of Technology Hyderabad Campus, Shameerpet, Hyderbad-500078 India., 3 Open Source Drug Discovery, Council of Scientific and Industrial Research, India  METHODS CONCLUSIONS INTRODUCTION RESULTS Mycobacterium tuberculosis encodes 11 putative serine-threonine proteins Kinases (STPK) which regulates transcription, cell development and interaction with the host cells. From the 11 STPKs three kinases namely PknA, PknB and PknG have been related to the mycobacterial growth. PknB sequence identity is less than 27% but the structure showed a very low RMSD of 1.36 Å and 1.72 Å with eukaryotic kinases. When developing the pharmacophore we found that the new compounds Figure 1 in the pipeline resembles a typical kinase Class I type pharmacophore. I The selection of possible pharmacophore was based on the Enrichment results, %yield of actives, specificity and Goodness of Hit list (GHscore). Another objective of our screening was how early in a virtual screening run the program can identify actives compounds. We used BEDROC and RIE metric to determine it. In this work we used pharmacophore, shape based screening and docking, scores and ranks as input in data fusion ranking algorithms namely, sum rank, sum score and reciprocal rank. We have identified reciprocal rank Datasets : 62 available Inhibitors collected from literature, PknB Protein (PDB ID: 2FUM) & 1000 decoy dataset available from http://www.schrodinger.com/gli de decoy_set.A validation dataset of 35 actives from 62 and 1000 decoys was prepared. Tools used: Glide(Docking),E- pharmacophore(Glide XP + Phase) ,ROCS(Shape Similarity),enrichVS(R package) Pharmacophore: Glide XP descriptors was used for E-pharmacophore generation. E-pharmacophore I and II are from compound I and compound VIII respectively as because these compounds docked top 2 in the docking program. ROCS : 1000 conformations were generated for validation dataset using low energy cut-off of 5(kcal/mol),RMSD (0.6 Å) for duplicate removal as suggested by Bostrom etal. Compound VIII was taken as query for ROCS based virtual screening and compounds are scored and ranked based on Tanimoto Combo score. Glide Docking: 2FUM is prepared using Maestro Prime with water molecules removed. A grid box of 12Å was used for docking. Fusion Algorithms: Sum score - The normalized scores of each ranking are summed to get the fused score of a compound Sum rank - The ranks of each method are summed to get a fused rank of a compound Reciprocal rank - Reciprocal rank combine the normalized scores based on Equation 1. Equation1. Reciprocal rank fusion score Figure 1 Workflow of data fusion Data fusion algorithms here confirm identification of active compounds “early” in a virtual screening process. In this work reciprocal rank has the best performance. Data fusion reduces dependency of using a single tool for virtual screening. The reciprocal rank algorithm was capable enough to select most of the active compounds early in a virtual screening process with a very high BEDROC score. Optimization of E-pharmacophore is very crucial to identify most important sites of a pharmacophore. Random forest models were tested based on the MACCS keys, 2D Pharmacophore fingerprints and CATS descriptors on the Asinex datasets. All the methods unable to select compounds from 3D screening sets showing a possible chance of lead hopping in 3D methods. A list of 45 compounds were finally selected for experimental validation. E-Pharmacophores:E-pharmacophore II was optimized to e-pharmacophore III based on the %yield of actives, specificity and GH score. Table 1 Showing different types e- pharmacophore results Table 2 Showing the statistical measures of structure based ,ligand based and data fusion methods. Figure 3 a)showing the performance metrics of structure and ligand based methods with data fusion. b) PCA plot of predicted inhibitors with the PknB inhibitors . References Salam etal .J. Chem. Inf. Model. 2009, 49, 2356–2368. Truchon et al.J. Chem. Inf. Model. 2007, 47, 488-508. Svensson et al.J. Chem. Inf. Model., 2012, 52 (1),225–232 Nuray, R. etal. Information Processing and Management 42 (2006) 595–614 Zuccotto etal. J.Med Chem 2010,53 2681- 2694. Acknowledgements Indo US Science technology forum for providing travel grant and stipend. Open source drug discovery for providing publication charges. Birla Institute of Technology Hyderabad Campus India. 1 st Official conference of the International Chemical Biology Society Oct 4-5 2012 Cambridge, MA, USA j ij i C pos C r ) ( 1 1 ) (

Upload: abhik-seal

Post on 27-Jun-2015

323 views

Category:

Documents


0 download

DESCRIPTION

Poster presented at International Chemical Biology Society.

TRANSCRIPT

Page 1: Poster

3D Virtual Screening of PknB Inhibitors using data fusion methodsAbhik Seal1, Perumal Yogeeswari2, Dharmaranjan Sriram2, David J Wild1,OSDD Consortium3

1School of Informatics and Computing Indiana University Bloomington USA, 2Department of Pharmacy Birla Institute of Technology Hyderabad Campus, Shameerpet, Hyderbad-500078 India.,

3Open Source Drug Discovery, Council of Scientific and Industrial Research, India



METHODS CONCLUSIONSINTRODUCTION RESULTSMycobacterium tuberculosis encodes 11 putative serine-threonine proteins Kinases (STPK) which regulates transcription, cell development and interaction with the host cells. From the 11 STPKs three kinases namely PknA, PknB and PknG have been related to the mycobacterial growth. PknB sequence identity is less than 27% but the structure showed a very low RMSD of 1.36 Å and 1.72 Å with eukaryotic kinases. When developing the pharmacophore we found that the new compounds Figure 1 in the pipeline resembles a typical kinase Class I type pharmacophore.

I

The selection of possible pharmacophore was based on the Enrichment results, %yield of actives, specificity and Goodness of Hit list (GHscore).

Another objective of our screening was how early in a virtual screening run the program can identify actives compounds. We used BEDROC and RIE metric to determine it.In this work we used pharmacophore, shape based screening and docking, scores and ranks as input in data fusion ranking algorithms namely, sum rank, sum score and reciprocal rank. We have identified reciprocal rank algorithm performs best in selecting compounds "early" in a virtual screening process. We have also screened the Asinex database of 400K compounds with reciprocal rank algorithm to select potential 45 hits for PknB.

Datasets : 62 available Inhibitors collected from literature, PknB Protein (PDB ID: 2FUM) & 1000 decoy dataset available from http://www.schrodinger.com/gli de decoy_set.A validation dataset of 35 actives from 62 and 1000 decoys was prepared.Tools used: Glide(Docking),E-pharmacophore(Glide XP + Phase) ,ROCS(Shape Similarity),enrichVS(R package)Pharmacophore: Glide XP descriptors was used for E-pharmacophore generation. E-pharmacophore I and II are from compound I and compound VIII respectively as because these compounds docked top 2 in the docking program.ROCS : 1000 conformations were generated for validation dataset using low energy cut-off of 5(kcal/mol),RMSD (0.6 Å) for duplicate removal as suggested by Bostrom etal. Compound VIII was taken as query for ROCS based virtual screening and compounds are scored and ranked based on Tanimoto Combo score.Glide Docking: 2FUM is prepared using Maestro Prime with water molecules removed. A grid box of 12Å was used for docking.Fusion Algorithms:• Sum score - The normalized scores of each ranking are summed to get the fused score of a compound• Sum rank - The ranks of each method are summed to get a fused rank of a compound• Reciprocal rank - Reciprocal rank combine the normalized scores based on Equation 1.

Equation1. Reciprocal rank fusion score

Figure 1 Workflow of data fusion

• Data fusion algorithms here confirm identification of active compounds “early” in a virtual screening process. In this work reciprocal rank has the best performance.

• Data fusion reduces dependency of using a single tool for virtual screening.

• The reciprocal rank algorithm was capable enough to select most of the active compounds early in a virtual screening process with a very high BEDROC score.

• Optimization of E-pharmacophore is very crucial to identify most important sites of a pharmacophore.

• Random forest models were tested based on the MACCS keys, 2D Pharmacophore fingerprints and CATS descriptors on the Asinex datasets. All the methods unable to select compounds from 3D screening sets showing a possible chance of lead hopping in 3D methods.

• A list of 45 compounds were finally selected for experimental validation.

E-Pharmacophores:E-pharmacophore II was optimized to e-pharmacophore III based on the %yield of actives, specificity and GH score.

Table 1 Showing different types e-pharmacophore results

Table 2 Showing the statistical measures of structure based ,ligand based and data fusion methods.

Figure 3 a)showing the performance metrics of structure and ligand based methods with data fusion. b) PCA plot of predicted inhibitors with the PknB inhibitors.

References

• Salam etal .J. Chem. Inf. Model. 2009, 49, 2356–2368.• Truchon et al.J. Chem. Inf. Model. 2007, 47, 488-508.• Svensson et al.J. Chem. Inf. Model., 2012, 52 (1),225–232• Nuray, R. etal. Information Processing and Management

42 (2006) 595–614• Zuccotto etal. J.Med Chem 2010,53 2681-2694.

Acknowledgements• Indo US Science technology forum for

providing travel grant and stipend.• Open source drug discovery for providing

publication charges.• Birla Institute of Technology Hyderabad

Campus India.

1st Official conference of the International Chemical Biology Society Oct 4-5 2012 Cambridge, MA, USA

j iji CposCr

)(1

1)(